The code below is the code which contains the link for "company index." I'm attempting to write the get_company_urls definition, but I can't figure out how to isolate the company link. In the first example we searched were able to isolate the link by the following:
Quote:
This line gets all "a" tags in table 7, and gives this list of tags the name "links".
Code:
return [a['href'] for a in links]This code takes the "href" attribute from each link, and constructs a new list. This list is returned from the function.
|
However, there weren't multiple links (if I remember correctly). Thus, if we use the same script then won't it not only return the company index link, but also "industry browser" etc.?
<table>
border=0
cellpadding=2
cellspacing=0
width=100%><tr><td
width=1%><img
src=http://us.i1.yimg.com/us.yimg.com/i/us/fi/03rd/selectorgray.gif></td><td
nowrap><font
face=arial
size=-1>
Summary
</font></td></tr><tr><td
width=1%>·</td><td
nowrap><a
href="http://us.rd.yahoo.com/finance/industry/morenews/moremod/*http://biz.yahoo.com/ic/news/112.html"><font
face=arial
size=-1>News</font></a></td></tr><tr><td
width=1%>·</td><td
nowrap><a
href="http://us.rd.yahoo.com/finance/industry/morell/moremod/*http://biz.yahoo.com/ic/ll/112pip.html"><font
face=arial
size=-1>Leaders
&
Laggards</font></a></td></tr><tr><td
width=1%>·</td><td
nowrap><font
face=arial
size=-1>
<a href="http://us.rd.yahoo.com/finance/industry/morecoindex/moremod/*http://biz.yahoo.com/ic/112_cl_all.html">Company Index</a>
</font></td></tr><tr><td
width=1%>·</td><td
nowrap><a
href="http://us.rd.yahoo.com/finance/industry/morecolist/moremod/*http://biz.yahoo.com/p/112conameu.html"><font
face=arial
size=-1>Industry
Browser</font></a></td></tr></table><table><tr><td
height=10></td></tr></table><table
border=0
width=100%
cellpadding=4
cellspacing=0><tr
bgcolor=556F93><td
valign=top
nowrap><font
face=verdana
size=-2
color=ffffff><b>Related
Industries</b></font></td></tr></table>
Thus, I can't figure out how to isolate the one link so that I can create a function which fetches it... I'm going to play around with it but if anyone has any suggestions they're welcome