Well I tried to do a little problem solving and this is what I'm currently at:
def get_company_data(company_url):
soup = BeautifulSoup(urlopen(company_url))
#Company Profile
profile = soup.fetchText(re.compile("Company Profile"))[2]
companyprofile = profile.findNext("table")
return[companyprofile]
#Contact Information
contact = soup.firstText(re.compile("Contact Information"))
contacttable = contact.findParent("table")
return[contacttable]
#Financial Highlights
highlights = soup.firstText(re.compile("Highlights"))
fhighlights = highlights.findParent("table")
z = len(highlights)
if z == 0:
return["N/A"]
else:
return[fhighlights]
#Key People
key = soup.firstText(re.compile("Key People"))
keypeople = key.findParent("table")
return[keypeople]
for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_urls in get_company_urls(company_index):
company_data = get_company_data(company_url)
print company_data
sleep(1)
I was trying to figure out how to format the return statements by using get_company_urls etc., but I couldn't really come up with anything because we're using regex to locate data (rather than a link where we can access the tags "a" and 'href'). Thus, I wasn't really sure how to format it. Additionally, I tried to write the for-statement so the program knows to loop the functions, but again I think I'm off by a bit, if someone wants to take a look and possibly point me in the right direction I'd be very grateful as I'd like to get this exported to excel maybe later today.
I was going through the python tutorial and it reads:
Quote:
|
The return statement returns with a value from a function.
|
With this in mind, I think I might be right with my return statements?