View Single Post
Old May 19th, 2006, 1:20 PM   #54
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Okay this is what I have so far, compiling pretty much just what you said in your threads, and my attempt to define the function. I used the same format you used for getting the links even though I know it's wrong, I just wanted to see if I could get something (even if it was more than just company_index links). I encountered some errors so I think there's mistakes in the code other than just the definition function.

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

industry_page = "http://biz.yahoo.com/ic/ind_index.html"

def get_industry_urls(industry_page):
	soup  = BeautifulSoup(urlopen(industry_page))
	links = soup.fetch("table")[7].fetch("a")
	return [a['href'] for a in links]

industry_url = get_industry_urls(industry_page)

def get_company_index(industry_url):
        soup  = BeautifulSoup(urlopen(industry_url))
        company_links = soup.fetch("table")[9].fetch("a")[3]
        return [a['href'] for a in company_links]

company_url = get_company_index(industry_url)

for industry_url in get_industry_urls(industry_page):
    	company_index = get_company_index(industry_url)

	for company_url in get_company_index(company_index):
            print get_company_data(company_url)

is this kind of what it's supposed to look like?

Last edited by zem52887; May 19th, 2006 at 1:33 PM.
zem52887 is offline   Reply With Quote