One step at a time. There are a couple of lines in your code that suggest that you don't understand the meaning behind the code.
I think the most important concept you're missing is
scope. In Python, some variables are
local. This means that they don't exist outside of where they are created. They exist only in a small part of the program.
Take the function below:
def get_industry_urls(industry_page):
soup = BeautifulSoup(urlopen(industry_page))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links] The variable called "soup" is declared inside the function. In Python, you declare a variable by assigning it a value with the "=" operator. Because "soup" is defined inside the function, it is local to that function. It does not exist outside of the boundaries in which it was created.
The same thing happens with "industry_page". When this function is called, there is not one, but
two variables called "industry_page". One exists outside the function, and is global. One exists inside the function, and is local. If this sounds confusing, that's because it is. That's why it makes sense to give the local industry_page a different name:
def get_industry_urls(url):
soup = BeautifulSoup(urlopen(url))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links] Try to think of functions as self-contained pieces of code. Functions should avoid affecting variables outside, except through the "return" statement.
Because of this, the line below does nothing in your program. You can remove it without altering your program's flow:
industry_url = get_industry_urls(industry_page)
Going back to functions. If you want to test out the function you have made, create a new text file with .py as the extension, and create some test code, like so:
def get_company_index(industry_url):
soup = BeautifulSoup(urlopen(industry_url))
company_links = soup.fetch("table")[11].fetch("a")[3]
return [a['href'] for a in company_links]
print get_company_index("http://biz.yahoo.com/ic/112.html")
raw_input("Press enter to continue...") When this piece of code works, then you know the function works, and you can put it back into your main program.