Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Oct 17th, 2006, 3:08 AM   #21
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
Quote:
Originally Posted by nytrokiss View Post
sure i want it to download a webpage grab all the product url's on a page if there is a second page i want it to go to the second page and grab them there also and then return a list of url's and my issue is that it won't return when i want it to!
I'd advise trying Beautiful Soup for this sort of thing. It's a HTML parser library, and available as a single .py file.

For instance, if you want to print a list of all links (the href and the text of the link) on the page:
python Syntax (Toggle Plain Text)
  1. html = urlopen(some_url)
  2. soup = BeautifulSoup(html)
  3. links = soup.fetch('a')
  4. for link in links:
  5. if "href" in link.attrMap:
  6. print link['href'], ":", link.string
(The if-statements checks to see if the link tag has a href attribute, since not all link tags do. But if you're fairly sure that there will never be a link on the page without a href attribute, you can skip the check)

Anyway, maybe for what you're doing, you could do something like:
python Syntax (Toggle Plain Text)
  1. def get_links(url):
  2. soup = BeautifulSoup(urlopen(url))
  3. links = [a for a in soup.fetch('a') if "href" in a.attrMap]
  4. urls = [a["href"] for a in links if a.string != "Next"]
  5. next = [a for a in links if a.string == "Next"]
  6. if next:
  7. urls += get_links(next[0]["href"])
  8. return urls
Arevos is offline   Reply With Quote
Old Oct 17th, 2006, 3:14 AM   #22
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Just an aside: if you search the forum for Beautiful Soup, you will find an outstanding thread that delves into it quite thoroughly, courtesy of Arevos. I'm quite sure he actually got a guy promoted well beyond his Peter Principle limit.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Oct 17th, 2006, 3:17 AM   #23
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
Using Beautiful Soup is my cure-all solution for HTML parsing woes in Python
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Combining languages titaniumdecoy Other Programming Languages 12 Jul 13th, 2006 2:03 PM
libraries matko C 1 Jan 22nd, 2006 2:12 PM
Php Postgresql Class Pizentios Show Off Your Open Source Projects 15 Jun 28th, 2005 9:55 AM
Jackpot game zorin Visual Basic 3 Jun 10th, 2005 1:19 PM
airport Log program using 3D linked List : problem reading from file gemini_shooter C++ 0 Mar 2nd, 2005 4:12 PM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 9:14 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC