I've taken a further look at this, and it seems like Yahoo! Business isn't really a fan of the semantic web, but sure does like it's tables. The hardest part of this will be navigating through all of the tables Yahoo! has on its pages.
Do you know much about HTML? If not, it's best to find some tutorial online to give you a brief crash-course in it.
Meanwhile, I'll go over a bit of Python and Beautiful Soup to get you going. Once you've installed both of these, run the Python interactive prompt. Python can be run in two ways; as a fixed script, or interactively. The interactive method is generally used for experimentation.
When you run the interactive prompt, you'll be presented with something like this:
Python 2.4.3 (#2, Apr 27 2006, 14:43:58)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
The second line will probably say something about Microsoft Windows on your machine.
Anyway, this is the prompt. You type in code, press enter, and Python evaluates it and returns an answer. For instance:
>>> 10 + 5
15
>>> print "Hello World"
Hello World
In order to access Yahoo! Business, we need to import functions from two libraries. To get the HTML, we need urlopen from the urllib2 library. Not surprisingly, urlopen does exactly what it says; it opens URLs. For example:
>>> from urllib2 import urlopen
>>> urlopen("http://www.google.com").read() The above code prints out the HTML from Google. That's good, but isn't too useful on its own. We need to make sense of it, or to
parse it. That's where BeautifulSoup comes in.
And my dinner's just about finished cooking, I think, so I'll post the rest up later.