Okay this is what I have so far, compiling pretty much just what you said in your threads, and my attempt to define the function. I used the same format you used for getting the links even though I know it's wrong, I just wanted to see if I could get something (even if it was more than just company_index links). I encountered some errors so I think there's mistakes in the code other than just the definition function.
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
industry_page = "http://biz.yahoo.com/ic/ind_index.html"
def get_industry_urls(industry_page):
soup = BeautifulSoup(urlopen(industry_page))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links]
industry_url = get_industry_urls(industry_page)
def get_company_index(industry_url):
soup = BeautifulSoup(urlopen(industry_url))
company_links = soup.fetch("table")[9].fetch("a")[3]
return [a['href'] for a in company_links]
company_url = get_company_index(industry_url)
for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_url in get_company_index(company_index):
print get_company_data(company_url)
is this kind of what it's supposed to look like?