Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Nov 16th, 2006, 11:31 PM   #1
nytrokiss
Newbie
 
Join Date: Oct 2006
Posts: 23
Rep Power: 0 nytrokiss is on a distinguished road
Red face Retaining the contents of a list during a recursive function

def find_all_items(site):
    site = urllib.urlopen(site).read()
    all_items = re.findall(r'watch\.asp\?\w+\=\w*\&\w*\=\w+',site)
    next_page = re.findall(r'Watches\.asp\?\w+\=[0-9]+\&pg\=\w+',site)
    try:
        find_all_items(urllib.basejoin("<some url>",next_page[0]))
    except IndexError:
        pass
    return remove_dups(all_items) # ,<---- this removes all the duplicate items


How do i keep the value of all_items that i have within the first function call and return a list of all the items at once also from the second function call

Now the list is being overwritten
nytrokiss is offline   Reply With Quote
Old Nov 16th, 2006, 11:34 PM   #2
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Is this a tutorial or a question? It doesn't seem to say, anywhere.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Nov 17th, 2006, 1:06 AM   #3
titaniumdecoy
Expert Programmer
 
titaniumdecoy's Avatar
 
Join Date: Nov 2005
Posts: 837
Rep Power: 3 titaniumdecoy is on a distinguished road
Send a message via AIM to titaniumdecoy
It never fails to amaze me how so many of the newcomers to this forum have such unbelievably bad (written) communication skills.
titaniumdecoy is offline   Reply With Quote
Old Nov 17th, 2006, 2:34 AM   #4
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
def find_all_items(site):
    site = urllib.urlopen(site).read()
    all_items = re.findall(r'watch\.asp\?\w+\=\w*\&\w*\=\w+',site)
    next_page = re.findall(r'Watches\.asp\?\w+\=[0-9]+\&pg\=\w+',site)
    try:
        all_items.extend(find_all_items(urllib.basejoin("<some url>",next_page[0])))
    except IndexError:
        pass
    return remove_dups(all_items) # ,<---- this removes all the duplicate items
Also, you may want to think about using the set() data type, which contains only unique values and doesn't retain order.

Perhaps:
def find_all_items(site):
    site = urllib.urlopen(site).read()
    all_items = set(re.findall(r'watch\.asp\?\w+\=\w*\&\w*\=\w+',site))
    next_page = re.findall(r'Watches\.asp\?\w+\=[0-9]+\&pg\=\w+',site)
    try:
        all_items.union(find_all_items(urllib.basejoin("<some url>",next_page[0])))
    except IndexError:
        pass
    return all_items
And I can't see a Python HTML parsing problem without mentioning Beautiful Soup. Though your problem might be simple enough that it needs only regexs.
Arevos is offline   Reply With Quote
Old Nov 17th, 2006, 3:01 AM   #5
nytrokiss
Newbie
 
Join Date: Oct 2006
Posts: 23
Rep Power: 0 nytrokiss is on a distinguished road
I tried both the examples the issue is that the list all_items is cleared when the function is called again!
nytrokiss is offline   Reply With Quote
Old Nov 17th, 2006, 3:20 AM   #6
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
Hm. It might help if you showed all your code, as it's difficult to understand exactly what you're asking for. Do you want some sort of persistence? Then you either need a functor:
python Syntax (Toggle Plain Text)
  1. class FindAllItemsFunctor
  2. def __init__(self):
  3. self.all_items = []
  4. def __call__(self, site):
  5. # Your find_all_items code using self.all_items instead of all_items
  6.  
  7. find_all_items = FindAllItemsFunctor()
Or you need to use some functions within functions:
python Syntax (Toggle Plain Text)
  1. def find_all_items_creater():
  2. all_items = []
  3. def find_all_items(site):
  4. # ...
  5. return find_all_items
  6.  
  7. find_all_items = find_all_items_creater()
Or you could use a global variable:
python Syntax (Toggle Plain Text)
  1. all_items = []
  2. def find_all_items(site):
  3. global all_items
  4. # ...
Or even do something clever with the function's parameters:
python Syntax (Toggle Plain Text)
  1. def find_all_items(site, all_items = [])
  2. # ...
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
dev c++ software, template problem cairo C++ 11 Jun 2nd, 2006 12:42 PM
libraries matko C 1 Jan 22nd, 2006 2:12 PM
Jackpot game zorin Visual Basic 3 Jun 10th, 2005 1:19 PM
User-defined creatNode and deleteNode functions for a doubly-linked list jgs C 2 Apr 28th, 2005 8:53 AM
airport Log program using 3D linked List : problem reading from file gemini_shooter C++ 0 Mar 2nd, 2005 4:12 PM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 10:16 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC