![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#101 | ||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Quote:
|
||
|
|
|
|
|
#102 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
ah gotcha will do.
|
|
|
|
|
|
#103 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Arevos, you truly are a miracle worker, I was grabbing the wrong table and now get_company_urls is working flawlessly just as you had suggested.
On to get_company_data, again I'm not sure how to pull the data because there's more than I need in a given table. Is it possible to fetch certain parts of text as we did with links by fetching a, followed by href (in the case of list comprehensions... naturally )(posts 92&93 explain this problem more thoroughly in case anyone missed those... got a little ahead of myself and thus things got out of order) |
|
|
|
|
|
#104 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Yep; in theory you could do something like:
soup.fetch("table")[5].fetch("tr")[2].fetch("td")[1].string(The .string at the end gets the contents of the tag, just as ['href'] gets the href attribute of the tag) |
|
|
|
|
|
#105 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
excellent, I'll try that thank you once again
|
|
|
|
|
|
#106 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
okay this is going to take a little bit of time and effort boo. Is there a way to use BeautifulSoup to search for a given tag within a particular table as opposed to the whole site. For our purposes, the company data (along with unnessary data) is contained within the 3rd table. I want to isolate this table to be able to search for [td] tags to try and isolate the bits of data that are useful to me, is there a way to do this. I apologize for asking so many questions but I haven't been able to find a thorough tutorial that explains how to do the stuff we're doing here (i.e. incorporating html and beautifulsoup into python coding)
edit: I think I may have gotten it, would the following work to isolate table 3? html = urlopen("http://biz.yahoo.com/ic/135/135359.html").read("table")[3]hm, not that but something along those lines... this perhaps: >>> html = urlopen("http://biz.yahoo.com/ic/135/135359.html").read()
>>> soup = BeautifulSoup(html)
>>> soup.fetch("table")[3]my only question is that when I begin to search for td tags, will it refer to the link or just to that table? I could search using this: soup.fetch("table")[3].fetch("td")[5]Last edited by zem52887; May 22nd, 2006 at 10:57 AM. |
|
|
|
|
|
#107 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
I'm not quite sure what you're asking. This code results in all the tables on the page:
soup.fetch("table")soup.fetch("table")[2].fetch("a")soup.fetch("table")[2].fetch("td")chair.colour chair.chopup() Lets go back to BeautifulSoup: links = soup.fetch("tables")[3].fetch("a")all_tables = soup.fetch("tables")
table3 = all_tables[3]
links_in_table3 = table3.fetch("a") |
|
|
|
|
|
#108 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
right, so to clarify, I mean if I fetch table 3, then if I search for a [td] tag, does it only search within table 3 because that's what I called on... or does it search for the [td] tag in the entire site which urlopen, opened?
|
|
|
|
|
|
#109 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
It only searches within table 3. If you want to seach the whole site, use:
soup.fetch("td")soup.fetch("table")[3].fetch("td") |
|
|
|
|
|
#110 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
*heart pounding* please someone mollify my fears. I don't think the tables are constant from company to company. By this I mean that since some companies have more links or additional sectors, the Description/Financial Information/Contact Info/Key People aren't in the same exact place on every company. Please assure me that there is another way to get this information from the website other than searching by [td] tags because I'm so close... it would be such a shame if this is the only way to locate the pertinent information and for this script to go to waste after Arevos's (and my) hard work.
hm, I'm looking at the html again and there seem to be many tables within tables... I'm really struggling to deciper this data page. but maybe there is some hope after all... perhaps parsing using keywords? (is this possible). For instance instead of searching for a [td] or [table] etc., could we just search for the phrase "Financial Highlights"? |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|