Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old May 20th, 2006, 6:10 PM   #91
Cerulean
Professional Programmer
 
Cerulean's Avatar
 
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4 Cerulean is on a distinguished road
Yeah. Getting bunches of information from websites for the good of companies is fairly commonplace. I work part-time for a travel industry software provider and had to create a few spiders to scrape a lot of flight/hotel info to place in their database. It's a royal pain when the markup is ever-so-slightly non-uniform.. Seems to be okay with the Yahoo page. Keep up the good work Zem ;-)
Cerulean is offline   Reply With Quote
Old May 22nd, 2006, 7:58 AM   #92
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I'm back...
So I think I have get_company_urls done and now I'm attempting to do get_company_data, and I've run into a few problems. (Or have some questions). The data is all contained within one table, and since they're not links but just data, I'm not sure how to sort it or format that fetch line of code. By this I mean in the past when I had to pick a particular link from a certain table with multiple links Arevos taught:

index_link = soup.fetch("table")[11].fetch("a")[2]

With the bracketed number indicating the second link within the 11th table. If I need multiple pieces of data from a singular table, will I still be able to do this?


Additionally, some of the companies have financial highlights and some don't. Moreover, some of the companies don't have financial highlights data posted but they have a blank table (not a real table, but it looks like a table on the website... it's still contained within the aformentioned table) that says "financial highlights" while some companies have a blank space where financial highlights are posted.

For example:
http://biz.yahoo.com/ic/92/92296.html (Has Financial Highlight Data)
http://biz.yahoo.com/ic/133/133036.html (No Financial Highlight Box)
http://biz.yahoo.com/ic/101/101127.html (Financial Highlights but no Data)

Thanks for everyone's help thus far, we're on the home stretch.
zem52887 is offline   Reply With Quote
Old May 22nd, 2006, 8:29 AM   #93
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
Here is an example of the table from which I need to extract data...

<table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td><table cellpadding="2" cellspacing="0" border="0" width="100%"><tr><td align="center"><font face="verdana" size="-2">
Monday, May 22 2006  9:10am ET - U.S. Markets open in 20 minutes.

 </font></td></tr></table><div id="yfnav"><ul><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/home/SIG=10r204g7h/*http://finance.yahoo.com/">Home</a></li><li class="selected"><a href="http://us.rd.yahoo.com/finance/gnav/inv/top/SIG=10v1nlrio/*http://finance.yahoo.com/mt?u">Investing</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/news/top/SIG=10vple585/*http://biz.yahoo.com/top.html">News
&amp; Commentary</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/ret/top/SIG=115cjjpke/*http://finance.yahoo.com/retirement">Retirement
&amp; Planning</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/bnk/top/SIG=112ghsh4d/*http://finance.yahoo.com/banking">Banking
&amp; Credit</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/ln/top/SIG=10v8f92ss/*http://finance.yahoo.com/loan">Loans</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/tx/SIG=110c5f2kh/*http://finance.yahoo.com/taxes">Taxes</a></li></ul></div><div id="yfsubnav"><ul><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/mover/SIG=10vd7l2sa/*http://finance.yahoo.com/mo?u">Market
Overview</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/mstats/SIG=1163g9uf1/*http://finance.yahoo.com/actives?e=o">Market
Stats</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/sres/SIG=10pk6ocd9/*http://biz.yahoo.com/r/">Stocks</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/mf/SIG=110t9jh84/*http://finance.yahoo.com/funds">Mutual
Funds</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/etf/SIG=10uh90m76/*http://finance.yahoo.com/etf">ETFs</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/bnd/SIG=110qucpir/*http://finance.yahoo.com/bonds">Bonds</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/opt/SIG=10qkompup/*http://biz.yahoo.com/opt">Options</a></li><li class="selected"><a href="http://us.rd.yahoo.com/finance/gnav/inv/ind/SIG=10q9f3p5j/*http://biz.yahoo.com/ic/">Industries</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/fx/SIG=113apl3kn/*http://finance.yahoo.com/currency">Currency</a></li><li class=""><a href="http://us.rd.yahoo.com/finance/gnav/inv/edu/SIG=114u8naui/*http://finance.yahoo.com/education">Education</a></li></ul></div><div id="yfsearch"><form id="searchQuotes" action="http://finance.yahoo.com/q"><ul><li><label>Get
Quotes</label></li><li><input class="text" id="txtQuotes" name="s" /></li><li><input class="button" id="q" type="submit" value=" GO " /></li><li class="first"><a href="http://finance.yahoo.com/lookup">Symbol
Lookup</a></li><li><a href="http://finance.yahoo.com/search">Finance
Search</a></li></ul></form></div><table style="clear:both; text-transform:uppercase; margin-top:5px;" border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="EEEEEE"><td><font face="arial" size="+1">
<b>Industry
Center -
Chemicals - Major Diversified</b>
</font></td><!-- SpaceID=0 robot -->
</tr></table><table cellpadding="0" cellspacing="0" border="0"><tr><td height="5"></td></tr></table><table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td><font face="arial" size="-1">
<b>
<a href="http://biz.yahoo.com/ic/index.html">Industry Center</a>
 &gt; <a href="http://biz.yahoo.com/ic/110.html">Chemicals - Major Diversified</a>
 &gt;
Schenectady International, Inc. Company Profile
</b>
</font></td></tr></table><table cellpadding="0" cellspacing="0" border="0"><tr><td height="5"></td></tr></table><table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td valign="top" width="200" bgcolor="eeeeee"><table cellpadding="0" cellspacing="0" border="0" width="100%"><tr><td align="center"><table border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="556F93"><td valign="top" nowrap="nowrap"><font face="verdana" size="-2" color="ffffff">
<b>More On This Industry</b>
</font></td></tr></table><table border="0" cellpadding="2" cellspacing="0" width="100%"><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/110.html">Summary</a>
</font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/news/110.html">News</a>
</font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/ll/110mkt.html">Leaders &amp; Laggards</a>
</font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/110_cl_all.html">Company Index</a>
</font></td></tr><tr><td width="1%">·</td><td nowrap="nowrap"><a href="/p/"><font face="arial" size="-1">Industry Browser</font></a></td></tr></table><table border="0" width="100%" cellpadding="0" cellspacing="0"><tr><td height="10"></td></tr></table>
<table border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="556F93"><td valign="top" nowrap="nowrap"><font face="verdana" size="-2" color="ffffff"><b>Related
Industries</b></font></td></tr></table><table cellpadding="2" cellspacing="1" width="100%" bgcolor="eeeeee" border="0"><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/112/*http://biz.yahoo.com/ic/112.html">Agricultural Chemicals</a>
</font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/326/*http://biz.yahoo.com/ic/326.html">Cleaning Products</a>
</font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/623/*http://biz.yahoo.com/ic/623.html">Pollution & Treatment Controls</a>
</font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/322/*http://biz.yahoo.com/ic/322.html">Rubber & Plastics</a>
</font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/113/*http://biz.yahoo.com/ic/113.html">Specialty Chemicals</a>
</font></td></tr><tr><td width="1%" align="top">·</td><td nowrap="nowrap"><font face="arial" size="-1">
<a href="http://us.rd.yahoo.com/finance/industry/front/industryrel/111/*http://biz.yahoo.com/ic/111.html">Synthetics</a>
</font></td></tr></table><table><tr><td height="10"></td></tr></table><table border="0" width="100%" cellpadding="4" cellspacing="0"><tr bgcolor="556F93"><td valign="top" nowrap="nowrap"><font face="verdana" size="-2" color="ffffff"><b>Top
Industries</b></font></td></tr></table><table cellpadding="2" cellspacing="1" width="100%" bgcolor="eeeeee" border="0"><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/330/*http://biz.yahoo.com/ic/330.html"><font face="arial" size="-1">Auto Manufacturers - Major
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/515/*http://biz.yahoo.com/ic/515.html"><font face="arial" size="-1">Biotechnology
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/826/*http://biz.yahoo.com/ic/826.html"><font face="arial" size="-1">Business Software &amp; Services
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/110/*http://biz.yahoo.com/ic/110.html"><font face="arial" size="-1">Chemicals - Major Diversified
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/841/*http://biz.yahoo.com/ic/841.html"><font face="arial" size="-1">Communication Equipment
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/210/*http://biz.yahoo.com/ic/210.html"><font face="arial" size="-1">Conglomerates
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/810/*http://biz.yahoo.com/ic/810.html"><font face="arial" size="-1">Diversified Computer Systems
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/423/*http://biz.yahoo.com/ic/423.html"><font face="arial" size="-1">Diversified Investments
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/913/*http://biz.yahoo.com/ic/913.html"><font face="arial" size="-1">Diversified Utilities
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/510/*http://biz.yahoo.com/ic/510.html"><font face="arial" size="-1">Drug Manufacturers - Major
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/911/*http://biz.yahoo.com/ic/911.html"><font face="arial" size="-1">Electric Utilities
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/340/*http://biz.yahoo.com/ic/340.html"><font face="arial" size="-1">Food - Major Diversified
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/133/*http://biz.yahoo.com/ic/133.html"><font face="arial" size="-1">Industrial Metals &amp; Minerals
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/770/*http://biz.yahoo.com/ic/770.html"><font face="arial" size="-1">Major Airlines
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/120/*http://biz.yahoo.com/ic/120.html"><font face="arial" size="-1">Major Integrated Oil &amp; Gas
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/410/*http://biz.yahoo.com/ic/410.html"><font face="arial" size="-1">Money Center Banks
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/432/*http://biz.yahoo.com/ic/432.html"><font face="arial" size="-1">Property &amp; Casualty Insurance
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/830/*http://biz.yahoo.com/ic/830.html"><font face="arial" size="-1">Semiconductor - Broad Line
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/844/*http://biz.yahoo.com/ic/844.html"><font face="arial" size="-1">Telecom Services - Domestic
</font></a></td></tr><tr><td width="1%" valign="top">·</td><td nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/351/*http://biz.yahoo.com/ic/351.html"><font face="arial" size="-1">Tobacco Products, Other
</font></a></td></tr></table><table cellpadding="0" cellspacing="0"><tr><td height="2"></td></tr></table><table border="0" width="100%" cellpadding="0" cellspacing="0"><tr><td align="right" nowrap="nowrap"><a href="http://us.rd.yahoo.com/finance/industry/front/industrynav/more/*http://biz.yahoo.com/ic/ind_index.html"><font face="verdana" size="-2"><b>Complete Industry List...</b></font></a></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="20"></td></tr></table><table cellspacing="0" width="95%" bgcolor="bfcede"><tr><td><table cellpadding="6" cellspacing="0" bgcolor="ffffff" width="100%"><tr><td align="left"><a href="http://us.rd.yahoo.com/finance/industry/hoovers/SIG=111gu94sa/*http://www.hoovers.com/yahoofin"><img src="http://us.news2.yimg.com/us.yimg.com/p/fi/pr/55559.gif" width="82" height="35" border="0" /></a></td></tr><tr><td><font face="verdana" size="-2">Need more? Get unbiased, in-depth information on public and private companies worldwide.</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="20"></td></tr></table></td></tr></table></td><td width="10">&nbsp;&nbsp;</td><td valign="top"><table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td colspan="3" width="100%"><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td bgcolor="bfcede"><font face="verdana" size="-2">
<b>Schenectady International, Inc. Company Profile</b>
</font></td></tr></table></td></tr><tr><td height="5"></td></tr><tr valign="top"><td width="49%"><table cellpadding="0" cellspacing="0" width="100%"><tr><td><font face="arial" size="-1">
Schenectady International came to life by bringing good varnishes to life for General Electric. Founded by Howard Wright in 1906, the company was established to develop insulating varnishes for GE's early electrical devices. Schenectady International has sold its electrical insulating business to ALTANA, but it still makes friction material resins for use in making brake linings and clutch facings. The company's other products include alkylphenols, phenolic resins, and electronic chemicals used in the production of semiconductors, imaging products, packaging, rubber compounds, agrochemicals, dyes, fuel additives, and flavoring agents. Descendants of Wright still own the company.
 </font></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="20"></td></tr></table>
<table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table></td><td width="5"><spacer type="block" width="5" height="1" /></td><td width="49%" align="right"><table border="0" cellpadding="0" cellspacing="0" bgcolor="dcdcdc" width="100%"><tr><td><table border="0" cellpadding="2" cellspacing="1" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"><b>Contact Information</b></font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">
Address: </font></td><td bgcolor="white"><font face="arial" size="-1">
2750 Balltown Rd.<br />Schenectady, NY 12304
 </font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">Phone:</font></td><td bgcolor="white"><font face="arial" size="-1">518-347-4200</font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">Fax:</font></td><td bgcolor="white"><font face="arial" size="-1">518-346-3111</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellpadding="0" cellspacing="0" bgcolor="dcdcdc" width="100%"><tr><td><table border="0" cellpadding="2" cellspacing="1" width="100%"><tr><td colspan="2"><font face="verdana" size="-2"><b>Financial
Highlights</b></font></td></tr><tr valign="top"><td bgcolor="eeeeee"><font face="arial" size="-1">Fiscal Year End:</font></td><td bgcolor="white"><font face="arial" size="-1">December</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellspacing="0" width="100%" bgcolor="dcdcdc"><tr><td><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td colspan="2"><font face="verdana" size="-2">
<b>Key People</b>
</font></td></tr><tr valign="top"><td width="1%" bgcolor="white"><font face="arial" size="-1">•</font></td><td bgcolor="white"><font face="arial" size="-1">
Chairman and CEO:
Wallace A. Graham
	</font></td></tr><tr valign="top"><td width="1%" bgcolor="white"><font face="arial" size="-1">•</font></td><td bgcolor="white"><font face="arial" size="-1">
President and COO:
Charles G. Griswold
	</font></td></tr><tr valign="top"><td width="1%" bgcolor="white"><font face="arial" size="-1">•</font></td><td bgcolor="white"><font face="arial" size="-1">
SVP, Financial and CFO:
John C. Obst
	</font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellspacing="0" width="100%" bgcolor="dcdcdc"><tr><td><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td colspan="2"><font face="verdana" size="-2">
<b>Industry Information</b>
</font></td></tr><tr><td bgcolor="white"><font face="arial" size="-1">
Sector: <a href="/p/1conameu.html">Basic Materials</a></font></td></tr><tr><td bgcolor="white"><font face="arial" size="-1">
Industry: <a href="/p/">Chemicals - Major Diversified</a></font></td></tr></table></td></tr></table><table border="0" cellpadding="0" cellspacing="0" height="10"><tr><td height="10"></td></tr></table><table border="0" cellspacing="0" width="100%" bgcolor="dcdcdc"><tr><td><table border="0" cellpadding="3" cellspacing="0" width="100%"><tr><td colspan="2"><font face="verdana" size="-2">
<b>Top Competitors</b>
</font></td></tr><tr><td width="1%" bgcolor="white"><font face="arial" size="-1">
•
</font></td><td bgcolor="white"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/10/10471.html">The Dow Chemical Company</a>
 (<a href="http://finance.yahoo.com/q?s=DOW&d=t">DOW</a>)
 </font></td></tr><tr><td width="1%" bgcolor="white"><font face="arial" size="-1">
•
</font></td><td bgcolor="white"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/10/10558.html">Ferro Corporation</a>
 (<a href="http://finance.yahoo.com/q?s=FOE&d=t">FOE</a>)
 </font></td></tr><tr><td width="1%" bgcolor="white"><font face="arial" size="-1">
•
</font></td><td bgcolor="white"><font face="arial" size="-1">
<a href="http://biz.yahoo.com/ic/101/101812.html">Mitsui Chemicals, Inc.</a>
</font></td></tr></table></td></tr></table></td></tr><tr height="20"><td></td></tr><tr><td colspan="3"><font face="arial" size="-1">
<i>Need
more? Get additional in-depth company and industry
information
from <a href="http://us.rd.yahoo.com/finance/industry/hoovers/SIG=111gu94sa/*http://www.hoovers.com/yahoofin">Hoover's
Online</a>.</i>
</font></td></tr></table></td></tr></table></td></tr></table>
zem52887 is offline   Reply With Quote
Old May 22nd, 2006, 8:37 AM   #94
Mocker
Hobbyist Programmer
 
Mocker's Avatar
 
Join Date: Oct 2005
Location: Indiana
Posts: 202
Rep Power: 0 Mocker is an unknown quantity at this point
Send a message via AIM to Mocker
Wow this thread is awesome. Arevos.. awesome job leading through python. Makes me want to take another look at it now. And kudos to zem for taking the time to learn the steps instead of just asking others to do it for you (as it seems most people do ) .
__________________
#programmingforums relay - http://thegupstudio.com/cgi-bin/pforelay.cgi
freelance scripts - http://ryanguthrie.com/index.html
Mocker is offline   Reply With Quote
Old May 22nd, 2006, 9:23 AM   #95
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
lastly, I'm trying to test my get_company_urls to see if I'm ready for get_company_data, and I'm encountering some errors while trying to run the module, if anyone could help me troubleshoot I'd really appreciate it because I've been staring at this code for awhile and can't figure out what I did wrong.

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
from time import sleep

industry_page = "http://biz.yahoo.com/ic/ind_index.html"

def get_industry_urls(industry_page):
	soup  = BeautifulSoup(urlopen(industry_page))
	links = soup.fetch("table")[7].fetch("a")
	return [a['href'] for a in links if a.string != "Alphabetical"]

def get_company_index(industry_url):
        soup  = BeautifulSoup(urlopen(industry_url))
        index_link = soup.fetch("table")[11].fetch("a")[2]
        return index_link['href']
                
def get_company_urls(company_index):
        soup = BeautifulSoup(urlopen(company_index))
        urls = soup.fetch("table")[1].fetch("a")        
        return[a['href'] for a in urls]

for industry_url in get_industry_urls(industry_page):
    	company_index = get_company_index(industry_url)
  
        for company_index in get_company_index(industry_url):
            print get_company_urls(company_index)
thanks
zem52887 is offline   Reply With Quote
Old May 22nd, 2006, 9:26 AM   #96
nnxion
Programming Guru
 
nnxion's Avatar
 
Join Date: Jun 2005
Location: elemental plane
Posts: 1,429
Rep Power: 5 nnxion is on a distinguished road
Quote:
Originally Posted by DaWei
I'm not being unkind, as I sympathize with your plight, but you are probably not going to get a volunteer. I hope I'm wrong.
Hehe David you were wrong. :p
__________________
"Employ your time in improving yourself by other men's writings, so that you shall gain easily what others have labored hard for."
-- Socrates
nnxion is offline   Reply With Quote
Old May 22nd, 2006, 9:43 AM   #97
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Indeed I was. Not the first time, either. Great thread.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old May 22nd, 2006, 9:46 AM   #98
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
Quote:
Originally Posted by zem52887
lastly, I'm trying to test my get_company_urls to see if I'm ready for get_company_data, and I'm encountering some errors while trying to run the module, if anyone could help me troubleshoot I'd really appreciate it because I've been staring at this code for awhile and can't figure out what I did wrong.
What do the error messages say? If you don't give people the error messages that were displayed, it makes it very much more difficult to solve the problem.

Also, when you're testing, test each function individually and in isolation. This takes less time than testing all the functions together, and doesn't hammer Yahoo! so hard.

However, I believe your problem lies in this piece of code:
for industry_url in get_industry_urls(industry_page):
    	company_index = get_company_index(industry_url)
  
        for company_index in get_company_index(industry_url):
            print get_company_urls(company_index)
You have a loop that you don't need. Remember that get_company_index returns a single value. This is because each industry page only has one company index each.

For-loops, like list comprehensions, are used for lists and sequences of values only. They are a way of apply code to each item in a list. Attempting to apply a for-loop to a single item only causes errors.

Remove the unnecessary for-loop and add in a sleep function, so that your program waits for a certain amount of time between each industry. This is to ensure you don't overload the Yahoo! servers. If you query Yahoo! too fast, too often, then Yahoo! may consider you a malicious entity and prevent your IP address from accessing the site. Therefore, be careful and be polite.
Arevos is offline   Reply With Quote
Old May 22nd, 2006, 9:52 AM   #99
zem52887
Hobbyist Programmer
 
Join Date: May 2006
Posts: 127
Rep Power: 3 zem52887 is on a distinguished road
I apologize for not posting the error, in hindsight that makes sense. In regards to testing each function alone, I was under the impression that with my particular code the functions need to run in conjunction with one another. Or do you mean I should merely select a link to use to test it myself rather than using the general yahoo! index link which then undergoes the functions.

Yeah I forgot about the for-loops and what they're "for"... and you were right. The script is working now, however, it is retrieving not only the company_urls but in some cases it's retrieving the link for a company quote page, and in some cases non-existent links (it seems).
zem52887 is offline   Reply With Quote
Old May 22nd, 2006, 9:59 AM   #100
nnxion
Programming Guru
 
nnxion's Avatar
 
Join Date: Jun 2005
Location: elemental plane
Posts: 1,429
Rep Power: 5 nnxion is on a distinguished road
Quote:
Originally Posted by zem52887
In regards to testing each function alone, I was under the impression that with my particular code the functions need to run in conjunction with one another. Or do you mean I should merely select a link to use to test it myself rather than using the general yahoo! index link which then undergoes the functions.
Programmers tend to make some test data. We simulate as if it were connecting to a page, but we only give one test set so we can make it work with that. If that test set works, then we take some other test sets, we try to do every possible thing so that nothing will come unexpected when the client and/or boss comes to take a look. Try a kazillion test sets in the end, or just let it connect to the site, then if some test set fails, check out why and fix it.

Since you don't want to hammer the server, testing offline might be an idea. I don't think Yahoo will care all that much though, just don't do it too fast, or it will disconnect you (DDOS) like you already mentioned.
__________________
"Employ your time in improving yourself by other men's writings, so that you shall gain easily what others have labored hard for."
-- Socrates
nnxion is offline   Reply With Quote
Reply

Bookmarks