![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#91 |
|
Professional Programmer
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4
![]() |
Yeah. Getting bunches of information from websites for the good of companies is fairly commonplace. I work part-time for a travel industry software provider and had to create a few spiders to scrape a lot of flight/hotel info to place in their database. It's a royal pain when the markup is ever-so-slightly non-uniform.. Seems to be okay with the Yahoo page. Keep up the good work Zem ;-)
|
|
|
|
|
|
#92 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
I'm back...
So I think I have get_company_urls done and now I'm attempting to do get_company_data, and I've run into a few problems. (Or have some questions). The data is all contained within one table, and since they're not links but just data, I'm not sure how to sort it or format that fetch line of code. By this I mean in the past when I had to pick a particular link from a certain table with multiple links Arevos taught: index_link = soup.fetch("table")[11].fetch("a")[2]With the bracketed number indicating the second link within the 11th table. If I need multiple pieces of data from a singular table, will I still be able to do this? Additionally, some of the companies have financial highlights and some don't. Moreover, some of the companies don't have financial highlights data posted but they have a blank table (not a real table, but it looks like a table on the website... it's still contained within the aformentioned table) that says "financial highlights" while some companies have a blank space where financial highlights are posted. For example: http://biz.yahoo.com/ic/92/92296.html (Has Financial Highlight Data) http://biz.yahoo.com/ic/133/133036.html (No Financial Highlight Box) http://biz.yahoo.com/ic/101/101127.html (Financial Highlights but no Data) Thanks for everyone's help thus far, we're on the home stretch. |
|
|
|
|
|
#93 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Here is an example of the table from which I need to extract data...
<table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td><table cellpadding="2" cellspacing="0" border="0" width="100%"><tr><td align="center"><font face="verdana" size="-2"> Monday, May 22 2006 9:10am ET - U.S. Markets open in 20 minutes. </font></td></tr></table><div id="yfnav"><ul><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/home/SIG=10r204g7h/*http://finance.yahoo.com/">Home</a></li><li class="selected"><a href="http://us.rd.yahoo.com/finance/gnav/inv/top/SIG=10v1nlrio/*http://finance.yahoo.com/mt?u">Investing</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/news/top/SIG=10vple585/*http://biz.yahoo.com/top.html">News & Commentary</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/ret/top/SIG=115cjjpke/*http://finance.yahoo.com/retirement">Retirement & Planning</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/bnk/top/SIG=112ghsh4d/*http://finance.yahoo.com/banking">Banking & Credit</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/ln/top/SIG=10v8f92ss/*http://finance.yahoo.com/loan">Loans</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/tx/SIG=110c5f2kh/*http://finance.yahoo.com/taxes">Taxes</a></li></ul></div><div id="yfsubnav"><ul><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/mover/SIG=10vd7l2sa/*http://finance.yahoo.com/mo?u">Market Overview</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/mstats/SIG=1163g9uf1/*http://finance.yahoo.com/actives?e=o">Market Stats</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/sres/SIG=10pk6ocd9/*http://biz.yahoo.com/r/">Stocks</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/mf/SIG=110t9jh84/*http://finance.yahoo.com/funds">Mutual Funds</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/etf/SIG=10uh90m76/*http://finance.yahoo.com/etf">ETFs</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/bnd/SIG=110qucpir/*http://finance.yahoo.com/bonds">Bonds</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/opt/SIG=10qkompup/*http://biz.yahoo.com/opt">Options</a></li><li class="selected"><a href="http://us.rd.yahoo.com/finance/gnav/inv/ind/SIG=10q9f3p5j/*http://biz.yahoo.com/ic/">Industries</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/fx/SIG=113apl3kn/*http://finance.yahoo.com/currency">Currency</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/edu/SIG=114u8naui/*http://finance.yahoo.com/education">Education</a></li></ul></div><div id="yfsearch"><form id="searchQuotes" action="http://finance.yahoo.com/q"><ul><li><label>Get Quotes</label></li><li><input class="text" id="txtQuotes" name="s" /></li><li><input class="button" id="q" type="submit" value=" GO " /></li><li class="first"><a href="http://finance.yahoo.com/lookup">Symbol Lookup</a></li><li><a href="http://finance.yahoo.com/search">Finance Search</a></li></ul></form></div><table style="clear:both; text-transform:uppercase; margin-top:5px;" border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="EEEEEE"><td><font face="arial" size="+1"> <b>Industry Center - Chemicals - Major Diversified</b> </font></td><!-- SpaceID=0 robot --> </tr></table><table cellpadding="0" cellspacing="0" border="0"><tr><td height="5"></td></tr></table><table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td><font face="arial" size="-1"> <b> <a href="http://biz.yahoo.com/ic/index.html">Industry Center</a> > <a href="http://biz.yahoo.com/ic/110.html">Chemicals - Major Diversified</a> > Schenectady International, Inc. Company Profile </b> </font></td></tr></table><table cellpadding="0" cellspacing="0" border="0"><tr><td height="5"></td></tr></table><table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td valign="top" width="200" bgcolor="eeeeee"><table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td align="center"><table border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="556F93"><td valign="top" nowrap="nowrap"><font face="verdana" size="-2" color="ffffff"> <b>More On This Industry</b> </font></td></tr></table><table border="0" cellpadding="2" cellspacing="0" width="100%"><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/110.html">Summary</a> </font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/news/110.html">News</a> </font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/ll/110mkt.html">Leaders & Laggards</a> </font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/110_cl_all.html">Company Index</a> </font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><a href="/p/"><font face="arial" size="-1">Industry Browser</font></a></td></tr></table><table border="0" width="100%" cellpadding="0" cellspacing="0"><tr><td height="10"></td></tr></table> <table border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="556F93"><td valign="top" nowrap="nowrap"><font face="verdana" size="-2" color="ffffff"><b>Related Industries</b></font></td></tr></table><table cellpadding="2" cellspacing="1" width="100%" bgcolor="eeeeee" border="0"><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/112/*http://biz.yahoo.com/ic/112.html">Agricultural Chemicals</a> </font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/326/*http://biz.yahoo.com/ic/326.html">Cleaning Products</a> </font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/623/*http://biz.yahoo.com/ic/623.html">Pollution & Treatment Controls</a> </font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/322/*http://biz.yahoo.com/ic/322.html">Rubber & Plastics</a> </font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/113/*http://biz.yahoo.com/ic/113.html">Specialty Chemicals</a> </font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1"> <a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/111/*http://biz.yahoo.com/ic/111.html">Synthetics</a> </font></td></tr></table><table><tr><td height="10"></td></tr></table><table border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="556F93"><td valign="top" nowrap="nowrap"><font face="verdana" size="-2" color="ffffff"><b>Top Industries</b></font></td></tr></table><table cellpadding="2" cellspacing="1" width="100%" bgcolor="eeeeee" border="0"><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/330/*http://biz.yahoo.com/ic/330.html"><font face="arial" size="-1">Auto Manufacturers - Major </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/515/*http://biz.yahoo.com/ic/515.html"><font face="arial" size="-1">Biotechnology </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/826/*http://biz.yahoo.com/ic/826.html"><font face="arial" size="-1">Business Software & Services </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/110/*http://biz.yahoo.com/ic/110.html"><font face="arial" size="-1">Chemicals - Major Diversified </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/841/*http://biz.yahoo.com/ic/841.html"><font face="arial" size="-1">Communication Equipment </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/210/*http://biz.yahoo.com/ic/210.html"><font face="arial" size="-1">Conglomerates </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/810/*http://biz.yahoo.com/ic/810.html"><font face="arial" size="-1">Diversified Computer Systems </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/423/*http://biz.yahoo.com/ic/423.html"><font face="arial" size="-1">Diversified Investments </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/913/*http://biz.yahoo.com/ic/913.html"><font face="arial" size="-1">Diversified Utilities </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/510/*http://biz.yahoo.com/ic/510.html"><font face="arial" size="-1">Drug Manufacturers - Major </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/911/*http://biz.yahoo.com/ic/911.html"><font face="arial" size="-1">Electric Utilities </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/340/*http://biz.yahoo.com/ic/340.html"><font face="arial" size="-1">Food - Major Diversified </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/133/*http://biz.yahoo.com/ic/133.html"><font face="arial" size="-1">Industrial Metals & Minerals </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/770/*http://biz.yahoo.com/ic/770.html"><font face="arial" size="-1">Major Airlines </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/120/*http://biz.yahoo.com/ic/120.html"><font face="arial" size="-1">Major Integrated Oil & Gas </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/410/*http://biz.yahoo.com/ic/410.html"><font face="arial" size="-1">Money Center Banks </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/432/*http://biz.yahoo.com/ic/432.html"><font face="arial" size="-1">Property & Casualty Insurance </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/830/*http://biz.yahoo.com/ic/830.html"><font face="arial" size="-1">Semiconductor - Broad Line </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/844/*http://biz.yahoo.com/ic/844.html"><font face="arial" size="-1">Telecom Services - Domestic </font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/351/*http://biz.yahoo.com/ic/351.html"><font face="arial" size="-1">Tobacco Products, Other </font></a></td></tr></table><table cellpadding="0" cellspacing="0"><tr><td height="2"></td></tr></table><table border="0" width="100%" cellpadding="0" cellspacing="0"><tr><td align="right" nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/more/*http://biz.yahoo.com/ic/ind_index.html"><font face="verdana" size="-2"><b>Complete Industry List...</b></font></a></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="20"></td></tr></table><table cellspacing="0" width="95%" bgcolor="bfcede"><tr><td><table cellpadding="6" cellspacing="0" bgcolor="ffffff" width="100%"><tr><td align="left"><a href="http://us.rd.yahoo.com/finance/industry/hoovers/SIG=111gu94sa/*http://www.hoovers.com/yahoofin"><img src="http://us.news2.yimg.com/us.yimg.com/p/fi/pr/55559.gif" width="82" height="35" border="0" /></a></td></tr><tr><td><font face="verdana" size="-2">Need more? Get unbiased, in-depth information on public and private companies worldwide.</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="20"></td></tr></table></td></tr></table></td><td width="10"> </td><td valign="top"><table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td colspan="3" width="100%"><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td bgcolor="bfcede"><font face="verdana" size="-2"> <b>Schenectady International, Inc. Company Profile</b> </font></td></tr></table></td></tr><tr><td height="5"></td></tr><tr valign="top"><td width="49%"><table cellpadding="0" cellspacing="0" width="100%"><tr><td><font face="arial" size="-1"> Schenectady International came to life by bringing good varnishes to life for General Electric. Founded by Howard Wright in 1906, the company was established to develop insulating varnishes for GE's early electrical devices. Schenectady International has sold its electrical insulating business to ALTANA, but it still makes friction material resins for use in making brake linings and clutch facings. The company's other products include alkylphenols, phenolic resins, and electronic chemicals used in the production of semiconductors, imaging products, packaging, rubber compounds, agrochemicals, dyes, fuel additives, and flavoring agents. Descendants of Wright still own the company. </font></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="20"></td></tr></table> <table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table></td><td width="5"><spacer type="block" width="5" height="1" /></td><td width="49%" align="right"><table border="0" cellpadding="0" cellspacing="0" bgcolor="dcdcdc" width="100%"><tr><td><table border="0" cellpadding="2" cellspacing="1" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"><b>Contact Information</b></font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1"> Address: </font></td><td bgcolor="white"><font face="arial" size="-1"> 2750 Balltown Rd.<br />Schenectady, NY 12304 </font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">Phone:</font></td><td bgcolor="white"><font face="arial" size="-1">518-347-4200</font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">Fax:</font></td><td bgcolor="white"><font face="arial" size="-1">518-346-3111</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellpadding="0" cellspacing="0" bgcolor="dcdcdc" width="100%"><tr><td><table border="0" cellpadding="2" cellspacing="1" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"><b>Financial Highlights</b></font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">Fiscal Year End:</font></td><td bgcolor="white"><font face="arial" size="-1">December</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellspacing="0" width="100%" bgcolor="dcdcdc"><tr><td><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"> <b>Key People</b> </font></td></tr><tr valign="top"><td width="1%" bgcolor="white"><font face="arial" size="-1"></font></td><td bgcolor="white"><font face="arial" size="-1"> Chairman and CEO: Wallace A. Graham </font></td></tr><tr valign="top"><td width="1%" bgcolor="white"><font face="arial" size="-1"></font></td><td bgcolor="white"><font face="arial" size="-1"> President and COO: Charles G. Griswold </font></td></tr><tr valign="top"><td width="1%" bgcolor="white"><font face="arial" size="-1"></font></td><td bgcolor="white"><font face="arial" size="-1"> SVP, Financial and CFO: John C. Obst </font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellspacing="0" width="100%" bgcolor="dcdcdc"><tr><td><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"> <b>Industry Information</b> </font></td></tr><tr><td bgcolor="white"><font face="arial" size="-1"> Sector: <a href="/p/1conameu.html">Basic Materials</a></font></td></tr><tr><td bgcolor="white"><font face="arial" size="-1"> Industry: <a href="/p/">Chemicals - Major Diversified</a></font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellspacing="0" width="100%" bgcolor="dcdcdc"><tr><td><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"> <b>Top Competitors</b> </font></td></tr><tr><td width="1%" bgcolor="white"><font face="arial" size="-1"> </font></td><td bgcolor="white"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/10/10471.html">The Dow Chemical Company</a> (<a href="http://finance.yahoo.com/q?s=DOW&d=t">DOW</a>) </font></td></tr><tr><td width="1%" bgcolor="white"><font face="arial" size="-1"> </font></td><td bgcolor="white"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/10/10558.html">Ferro Corporation</a> (<a href="http://finance.yahoo.com/q?s=FOE&d=t">FOE</a>) </font></td></tr><tr><td width="1%" bgcolor="white"><font face="arial" size="-1"> </font></td><td bgcolor="white"><font face="arial" size="-1"> <a href="http://biz.yahoo.com/ic/101/101812.html">Mitsui Chemicals, Inc.</a> </font></td></tr></table></td></tr></table></td></tr><tr height="20"><td></td></tr><tr><td colspan="3"><font face="arial" size="-1"> <i>Need more? Get additional in-depth company and industry information from <a href="http://us.rd.yahoo.com/finance/industry/hoovers/SIG=111gu94sa/*http://www.hoovers.com/yahoofin">Hoover's Online</a>.</i> </font></td></tr></table></td></tr></table></td></tr></table> |
|
|
|
|
|
#94 |
|
Hobbyist Programmer
|
Wow this thread is awesome. Arevos.. awesome job leading through python. Makes me want to take another look at it now. And kudos to zem for taking the time to learn the steps instead of just asking others to do it for you (as it seems most people do
) .
__________________
#programmingforums relay - http://thegupstudio.com/cgi-bin/pforelay.cgi freelance scripts - http://ryanguthrie.com/index.html |
|
|
|
|
|
#95 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
lastly, I'm trying to test my get_company_urls to see if I'm ready for get_company_data, and I'm encountering some errors while trying to run the module, if anyone could help me troubleshoot I'd really appreciate it because I've been staring at this code for awhile and can't figure out what I did wrong.
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
from time import sleep
industry_page = "http://biz.yahoo.com/ic/ind_index.html"
def get_industry_urls(industry_page):
soup = BeautifulSoup(urlopen(industry_page))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links if a.string != "Alphabetical"]
def get_company_index(industry_url):
soup = BeautifulSoup(urlopen(industry_url))
index_link = soup.fetch("table")[11].fetch("a")[2]
return index_link['href']
def get_company_urls(company_index):
soup = BeautifulSoup(urlopen(company_index))
urls = soup.fetch("table")[1].fetch("a")
return[a['href'] for a in urls]
for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_index in get_company_index(industry_url):
print get_company_urls(company_index) |
|
|
|
|
|
#96 | |
|
Programming Guru
![]() Join Date: Jun 2005
Location: elemental plane
Posts: 1,429
Rep Power: 5
![]() |
Quote:
:p
__________________
"Employ your time in improving yourself by other men's writings, so that you shall gain easily what others have labored hard for." -- Socrates |
|
|
|
|
|
|
#97 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Indeed I was. Not the first time, either. Great thread.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#98 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Also, when you're testing, test each function individually and in isolation. This takes less time than testing all the functions together, and doesn't hammer Yahoo! so hard. However, I believe your problem lies in this piece of code: for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_index in get_company_index(industry_url):
print get_company_urls(company_index)For-loops, like list comprehensions, are used for lists and sequences of values only. They are a way of apply code to each item in a list. Attempting to apply a for-loop to a single item only causes errors. Remove the unnecessary for-loop and add in a sleep function, so that your program waits for a certain amount of time between each industry. This is to ensure you don't overload the Yahoo! servers. If you query Yahoo! too fast, too often, then Yahoo! may consider you a malicious entity and prevent your IP address from accessing the site. Therefore, be careful and be polite. |
|
|
|
|
|
|
#99 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
I apologize for not posting the error, in hindsight that makes sense. In regards to testing each function alone, I was under the impression that with my particular code the functions need to run in conjunction with one another. Or do you mean I should merely select a link to use to test it myself rather than using the general yahoo! index link which then undergoes the functions.
Yeah I forgot about the for-loops and what they're "for"... and you were right. The script is working now, however, it is retrieving not only the company_urls but in some cases it's retrieving the link for a company quote page, and in some cases non-existent links (it seems). |
|
|
|
|
|
#100 | |
|
Programming Guru
![]() Join Date: Jun 2005
Location: elemental plane
Posts: 1,429
Rep Power: 5
![]() |
Quote:
Since you don't want to hammer the server, testing offline might be an idea. I don't think Yahoo will care all that much though, just don't do it too fast, or it will disconnect you (DDOS) like you already mentioned.
__________________
"Employ your time in improving yourself by other men's writings, so that you shall gain easily what others have labored hard for." -- Socrates |
|
|
|
|
![]() |
| Bookmarks |