Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Python (http://www.programmingforums.org/forum43.html)
-   -   Retaining the contents of a list during a recursive function (http://www.programmingforums.org/showthread.php?t=11912)

nytrokiss Nov 16th, 2006 11:31 PM

Retaining the contents of a list during a recursive function
 
:

def find_all_items(site):
    site = urllib.urlopen(site).read()
    all_items = re.findall(r'watch\.asp\?\w+\=\w*\&\w*\=\w+',site)
    next_page = re.findall(r'Watches\.asp\?\w+\=[0-9]+\&pg\=\w+',site)
    try:
        find_all_items(urllib.basejoin("<some url>",next_page[0]))
    except IndexError:
        pass
    return remove_dups(all_items) # ,<---- this removes all the duplicate items



How do i keep the value of all_items that i have within the first function call and return a list of all the items at once also from the second function call

Now the list is being overwritten

DaWei Nov 16th, 2006 11:34 PM

Is this a tutorial or a question? It doesn't seem to say, anywhere.

titaniumdecoy Nov 17th, 2006 1:06 AM

It never fails to amaze me how so many of the newcomers to this forum have such unbelievably bad (written) communication skills.

Arevos Nov 17th, 2006 2:34 AM

:

def find_all_items(site):
    site = urllib.urlopen(site).read()
    all_items = re.findall(r'watch\.asp\?\w+\=\w*\&\w*\=\w+',site)
    next_page = re.findall(r'Watches\.asp\?\w+\=[0-9]+\&pg\=\w+',site)
    try:
        all_items.extend(find_all_items(urllib.basejoin("<some url>",next_page[0])))
    except IndexError:
        pass
    return remove_dups(all_items) # ,<---- this removes all the duplicate items

Also, you may want to think about using the set() data type, which contains only unique values and doesn't retain order.

Perhaps:
:

def find_all_items(site):
    site = urllib.urlopen(site).read()
    all_items = set(re.findall(r'watch\.asp\?\w+\=\w*\&\w*\=\w+',site))
    next_page = re.findall(r'Watches\.asp\?\w+\=[0-9]+\&pg\=\w+',site)
    try:
        all_items.union(find_all_items(urllib.basejoin("<some url>",next_page[0])))
    except IndexError:
        pass
    return all_items

And I can't see a Python HTML parsing problem without mentioning Beautiful Soup. Though your problem might be simple enough that it needs only regexs.

nytrokiss Nov 17th, 2006 3:01 AM

I tried both the examples the issue is that the list all_items is cleared when the function is called again!

Arevos Nov 17th, 2006 3:20 AM

Hm. It might help if you showed all your code, as it's difficult to understand exactly what you're asking for. Do you want some sort of persistence? Then you either need a functor:
:

  1. class FindAllItemsFunctor
  2.     def __init__(self):
  3.         self.all_items = []
  4.     def __call__(self, site):
  5.         # Your find_all_items code using self.all_items instead of all_items
  6.  
  7. find_all_items = FindAllItemsFunctor()

Or you need to use some functions within functions:
:

  1. def find_all_items_creater():
  2.     all_items = []
  3.     def find_all_items(site):
  4.         # ...
  5.     return find_all_items
  6.  
  7. find_all_items = find_all_items_creater()

Or you could use a global variable:
:

  1. all_items = []
  2. def find_all_items(site):
  3.     global all_items
  4.     # ...

Or even do something clever with the function's parameters:
:

  1. def find_all_items(site, all_items = [])
  2.     # ...



All times are GMT -5. The time now is 9:17 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC