Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Python (http://www.programmingforums.org/forum43.html)
-   -   RE working in the Interactive but not in script (http://www.programmingforums.org/showthread.php?t=11933)

nytrokiss Nov 18th, 2006 6:14 PM

RE working in the Interactive but not in script
 
I have a function
:

def next_page_finder(kite):
    site = urlopen(kite).read()
    next_site_pages = findall(r'\?Brand=\d+\&pg=\d+',site) #<--won't work in file
    new_pages = []

when i call the re in the interactive interpreter it returns me a list however when i call in within my code it gives me a blank list?


:confused: :confused: :eek: :eek:

Arevos Nov 18th, 2006 8:42 PM

Perhaps if you gave the URL it is failing on?

nytrokiss Nov 18th, 2006 9:11 PM

http://www.goldwatches.com/watches.asp?Brand=11

Kaja Fumei Nov 18th, 2006 9:50 PM

The regex is not supposed to match that URL. The regex only matches if "pg" is set by the URL. That URL doesn't do this but it would match this one: http://www.goldwatches.com/watches.asp?Brand=11&pg=0

If you want "pg" to be optional, change line 3 to:
:

next_site_pages = findall(r'\?Brand=\d+(\&pg=\d+)?',site)

nytrokiss Nov 18th, 2006 10:16 PM

1 Attachment(s)
Quote:

Originally Posted by Kaja Fumei (Post 119263)
The regex is not supposed to match that URL. The regex only matches if "pg" is set by the URL. That URL doesn't do this but it would match this one: http://www.goldwatches.com/watches.asp?Brand=11&pg=0

If you want "pg" to be optional, change line 3 to:
:

next_site_pages = findall(r'\?Brand=\d+(\&pg=\d+)?',site)


I am not trying to match that url i am trying to pull a list out of it and now i will post screen shots!

nytrokiss Nov 19th, 2006 10:32 PM

any answer????

titaniumdecoy Nov 19th, 2006 10:45 PM

It would help if you posted all the code you are running (as text, not as an image). Then I could try it out for you and try to figure out what is wrong.

nytrokiss Nov 19th, 2006 11:12 PM

Ok now that you see everything in diffrent steps i will show it in all and ones and please refer to the screen shot the code is
:

def next_page_finder(site):
    site = urlopen(site).read()
    next_site_pages = []
    next_site_pages.extend(findall(r'\?Brand=\d+\&pg=\d+',site))
    new_pages = []
    for _ in next_site_pages:
        new_pages.append(basejoin("http://www.goldwatches.com/watches.asp",_))
    return new_pages


The issue is that when i run the RE
:

next_site_pages.extend(findall(r'\?Brand=\d+\&pg=\d+',site))
it returns me an empty list however in the interactive input i run something akin to it
:

>>> from re import findall
>>> from urllib import basejoin,urlopen
>>> site = urlopen("http://www.goldwatches.com/watches.asp?Brand=11").read()
>>> findall(r'\?Brand=\d+\&pg=\d+',site)

Now as you see in the screen shot the interactive interpreter returns me a list of values however when i run it in regular python i get an empty list why??

nytrokiss Nov 19th, 2006 11:27 PM

Might i add that the webpage it is failing on ( an example) is http://www.goldwatches.com/watches.asp?Brand=11 (posted above)

Game_Ender Nov 20th, 2006 1:02 AM

I just tried the code, it work fine.


All times are GMT -5. The time now is 8:32 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC