![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#111 | |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
Your time has not been wasted, it just may not pay off to the extent you were (unrealistically) expecting. It is unlikely that diverse people are going to build their site to your expectations. They aren't in the business of doing your work for you. Just for shits and giggles, let me quote one of my previous posts:
Quote:
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
|
#112 | ||
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Quote:
Quote:
So please refrain from "I told you so" replies on this thread. They are neither productive nor necessary. Thank you. Instead, try offering a suggestion or two, maybe this last bit of data parsing could be do-able with another language... anything but "I told you so" and a lecture on a thread where people have dedicated a lot of time and effort, not only me but more seasoned/respected (*cough* Arevos *cough*) members from this forum. On a side note, when I read your replies to various threads, they always seem to be "read the manual," lrn2post, lecturing the ops. While I can agree there is a time and a place for that, it should be clear that a thread with 100 replies and 1000+ views is not. IMHO pick your spots, a spammer/troll who's never going to post here again, that's one thing, this is an entirely different case and as such, why not try coming up with a solution instead of reiterrating the likely scenario that I'm effed. Last edited by zem52887; May 22nd, 2006 at 12:57 PM. |
||
|
|
|
|
|
#113 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Python is a flexible language, and BeautifulSoup a fairly flexible parser. If the tables differ from company to company, then the task becomes more difficult, but not impossible.
I'll take a look around and see if I can come up with anything to do with searching. |
|
|
|
|
|
#114 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
Your confidence is as always refreshing, welcomed, and appreciated.
|
|
|
|
|
|
#115 | |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
If you took the post as offensive, then I'll lay it at the door of frustration, since you obviously aren't that stupid. You said:
Quote:
The quote of my previous post was not an "I told you so, nananananaaaaaaaana." It was a "you were forewarned to expect difficulties." Now, if you care to persist in your recriminations, you go right ahead. You can't deny that someone IS doing a lot of your work for you. That's okay, for you were learning, but now you're coming along and implying that it may be "toss in the towel" time, and that Arevos' work will go to waste. If you don't care for my posts, that's just tough titty. I don't plan to stop saying what I feel like saying. If you want to denigrate my efforts on this forum, you go right ahead. You can also kick your dam' cat when you get home, and slug the wife. I can't stop you.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
|
#116 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
BeautifulSoup is nicer and more powerful than I thought. It supports text searching through regular expressions and bidirectional navigation of search results.
The way I suggest solving this would be to do something like this: 1. Find a piece of text that is predictably near to the results you want 2. Using this piece of text as a starting point, navigate to the correct results I've got some dinner cooking, so I have to go. However, here's something quick to start you off: import re
# find "address" label
address = soup.firstText(re.compile("Address:"))
# find the first "tr" attribute above the address label
tr = address.findParent("tr")
# print the second "td" that belongs to the "tr" attribute:
print tr.fetch("td")[1] |
|
|
|
|
|
#117 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
will do thanks for the tip
edit: I spoke to my friend who's pretty good with this stuff, he thinks that if we could import everything into an excel document one could use "if" statement macros to get the information we need from it. any truth to this and is it a viable option? or would we be better of using regex? |
|
|
|
|
|
#118 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Well, yes... One could use Excel's if statements for a task like this, but it seems akin to using a pickaxe to fix a broken down car. Sure, it could be done, but it isn't really the best tool for the job.
Excel has functions and macros to manipulate data in its spreadsheets, but it wasn't designed to be a full programming language. I'm also unaware of any HTML parsing functionality in Excel that would really be necessary for any project involving the extraction of data from web pages. I don't like to dismiss options out of hand, but I think I'd be fairly safe in saying that a programming language such as Python is more suited to these sorts of tasks than a spreadsheet application. |
|
|
|
|
|
#119 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
fair enough, back to learning regex and mastering beautifulsoup.
|
|
|
|
|
|
#120 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
arite I've been reading about regex and going through the documentation and I'm starting to get it a little bit but I'm still not sure what some of the stuff means.
Specifically, what does the following code mean: fetchText(text, recursive, limit) I've been examining the HTML on Yahoo!'s site and I think the best way to search for regex would be to avoid using the [td] and [tr] tags and instead, use [table] tags. So my goal is to search something like: contact = soup.firstText(re.compile("Contact Information"))
contacttable = address.findParent("table")
contacttable.fetch("/table")[0]Ultimately, I want to fetch the table tags that surround the words "Contact Information." However, when I use compile, I get "list out of range" error, so I think, but might be wrong, that I need to use the re.search function because I'm not dealing with a list? Or do I need to use compile because we're going to perform this function 36,000 times? If that's the case then I'm formatting it wrong and do I need add another argument? |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|