![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
|
|
#1 |
|
Newbie
Join Date: Oct 2006
Posts: 23
Rep Power: 0
![]() |
I have a function
def next_page_finder(kite):
site = urlopen(kite).read()
next_site_pages = findall(r'\?Brand=\d+\&pg=\d+',site) #<--won't work in file
new_pages = [] :eek: :eek: |
|
|
|
|
|
#2 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Perhaps if you gave the URL it is failing on?
|
|
|
|
|
|
#3 |
|
Newbie
Join Date: Oct 2006
Posts: 23
Rep Power: 0
![]() |
|
|
|
|
|
|
#4 |
|
Hobbyist Programmer
Join Date: Oct 2005
Posts: 134
Rep Power: 3
![]() |
The regex is not supposed to match that URL. The regex only matches if "pg" is set by the URL. That URL doesn't do this but it would match this one: http://www.goldwatches.com/watches.asp?Brand=11&pg=0
If you want "pg" to be optional, change line 3 to: next_site_pages = findall(r'\?Brand=\d+(\&pg=\d+)?',site) |
|
|
|
|
|
#5 | |
|
Newbie
Join Date: Oct 2006
Posts: 23
Rep Power: 0
![]() |
Quote:
I am not trying to match that url i am trying to pull a list out of it and now i will post screen shots! |
|
|
|
|
|
|
#6 |
|
Newbie
Join Date: Oct 2006
Posts: 23
Rep Power: 0
![]() |
any answer????
|
|
|
|
|
|
#7 |
|
Expert Programmer
|
It would help if you posted all the code you are running (as text, not as an image). Then I could try it out for you and try to figure out what is wrong.
|
|
|
|
|
|
#8 |
|
Newbie
Join Date: Oct 2006
Posts: 23
Rep Power: 0
![]() |
Ok now that you see everything in diffrent steps i will show it in all and ones and please refer to the screen shot the code is
def next_page_finder(site):
site = urlopen(site).read()
next_site_pages = []
next_site_pages.extend(findall(r'\?Brand=\d+\&pg=\d+',site))
new_pages = []
for _ in next_site_pages:
new_pages.append(basejoin("http://www.goldwatches.com/watches.asp",_))
return new_pagesThe issue is that when i run the RE next_site_pages.extend(findall(r'\?Brand=\d+\&pg=\d+',site)) >>> from re import findall
>>> from urllib import basejoin,urlopen
>>> site = urlopen("http://www.goldwatches.com/watches.asp?Brand=11").read()
>>> findall(r'\?Brand=\d+\&pg=\d+',site) |
|
|
|
|
|
#9 |
|
Newbie
Join Date: Oct 2006
Posts: 23
Rep Power: 0
![]() |
Might i add that the webpage it is failing on ( an example) is http://www.goldwatches.com/watches.asp?Brand=11 (posted above)
|
|
|
|
|
|
#10 |
|
Professional Programmer
Join Date: May 2006
Location: Maryland, USA
Posts: 306
Rep Power: 3
![]() |
I just tried the code, it work fine.
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Backup Script :-) | Pizentios | Perl | 18 | Jan 12th, 2006 10:50 AM |
| A simple script isn't working correctly | scuzzman | Perl | 3 | Dec 23rd, 2005 6:42 AM |
| A simple perl script | satimis | Perl | 3 | Aug 15th, 2005 9:31 AM |
| Bash Script Help | pelon | Bash / Shell Scripting | 2 | Feb 28th, 2005 3:58 PM |