Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Mar 18th, 2007, 2:48 PM   #1
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 1,835
Rep Power: 5 Sane will become famous soon enough
Send a message via MSN to Sane
Hacking Python Memory

This might sound a little over the edge, but bare with me.

I have a Python program that at any given time could be storing huge amounts of memory in the RAM. However, most of the time only 1/8ths of that memory may actually be useful.

The obvious choice is keeping this stored in a mysql database so that mysql can decide for itself what's important enough to cache, and what can be stored on the hard drive.

However, I don't want to go that route, since the memory that will be accessed must be accessed as quickly as possible, and very frequently. MySQL is adequately fast, but retrieving large amounts of different information is too frequent to be fast enough.

Therefore, my idea was to create another layer overtop of the memory, and underneath the script execution. When a block of memory hasn't been accessed for a while, the layer will store the block of memory in a new file, and delete that portion from the RAM. If the script attempts to access that memory, the layer will retrieve the file's contents, delete the file, and store it in the memory again.

This could be very easy to do, or very difficult, depending on what Python has to offer in these regards. Does Python support lookups of memory address locations? Are there any existing libraries that can help?

The memory that's of interest is a list of class instances. Each class instance is storing several strings, integers and more lists.

My first thought is to solve this using Python's decorators, by adding a decorator to every function that will look at these class instances each time the function is called. If a variable is set to None or False, then its corresponding id(var) will have a file with its contents stored in it. Only problem is I don't believe that will lighten the load on the RAM, and that's a big problem.

Any help, advice, or food for thought will be very helpful. Thanks in advance.
Sane is offline   Reply With Quote
Old Mar 18th, 2007, 3:43 PM   #2
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
If you're using CPython (the official Python interpreter), then this is relatively simple. CPython uses a reference counting memory management system, which means that the number of references to an object is kept track of, and when this reaches zero, the object is instantly destroyed. This is a very simplistic and somewhat inefficient approach to garbage collection, as it's usually better to dereference a whole block of memory all at once, since freeing memory takes time; however there are advantages to reference counting. For instance, in CPython, you can do something like this:
lines = open("file.txt").readlines()
Because the file object created by "open" is not referenced, it's destroyed instantly afterwards, and the file is closed. In IronPython, which uses the .NET GC, the object isn't destroyed until sometime later when it is more efficient to discard the memory, and hence the file could be left open for a long time. So whilst reference counting is not efficient, it does result in very predictable behaviour.

Essentially, you could create a wrapper class that keeps objects on disk until they are needed, and then expires them after a certain amount of time (perhaps using the "shelve" module as storage). To expire an object, just remove all references to it. You may want to use the weakref module to make sure you don't give out any "real" references that might prevent your objects from being recycled. You could also use the __getattr__ method so that you can access your data like this:
diskcache.commonvar = 10   # gets from in-memory cache (a dict)
diskcache.rarevar += 10  # gets from disk (via shelve), stores in memory cache

...  # more stuff

# rarevar hasn't been access for some time,
# so it's removed when diskcache is accessed again:
diskcache.x += 1     # rarevar removed as x is returned
Arevos is offline   Reply With Quote
Old Mar 18th, 2007, 3:44 PM   #3
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
This is what a reasonably decent operating system tries to do for you, with its cache/virtual memory. Have you determined by performance measurements that it's really necessary?
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Mar 18th, 2007, 4:20 PM   #4
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 1,835
Rep Power: 5 Sane will become famous soon enough
Send a message via MSN to Sane
@Arevos : That all seems pretty straight forward. But how do I get the memory back when it's attempted to be referenced again? Maybe I don't see how this works.

@DaWei : I haven't yet done any measurements, but this is more because I anticipate that the list of class instances could potentially reach several hundred thousand instances. And my RAM can't possibly handle that cleanly.
Sane is offline   Reply With Quote
Old Mar 18th, 2007, 6:21 PM   #5
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by Sane View Post
@Arevos : That all seems pretty straight forward. But how do I get the memory back when it's attempted to be referenced again? Maybe I don't see how this works.
Well, if I'm understanding right, then you have a series of objects containing data. You could pickle those objects to a file, using something like the shelve module, and keep them in a time-limited cache whenever they're accessed. In order to access the objects, one must always go through the cache.
Arevos is offline   Reply With Quote
Old Mar 18th, 2007, 6:36 PM   #6
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 1,835
Rep Power: 5 Sane will become famous soon enough
Send a message via MSN to Sane
Edit :
Nevermind. You can probably disregard the original post. So, if I understand this correctly, the shelve module does not increase RAM for the number of objects being stored? It stores them on the hard drive, but my program will treat them as traditional variables?

Then I would add a layer before the shelf level, where a cache keeps their contents in the RAM?
An off-topic question here: Is there a function that automatically dumps an instance's contents to a binary file, and then reads it right back in with all the types and attributes in tact? If not, I could just quickly write one.

Original Post :
I'm not sure if I'm missing something, or if you're missing something, but to make sure we're on the same page, I should probably clarify:

Once a block has been removed from the RAM, it could still possibly be needed by the program at any time. When I say that 1/8th of the RAM might only be useful, I only mean at the current time. All 8/8ths of the RAM may be looked at at least once in a 24 hour period.

So once it's saved to the file, the program must still know to look at the file and add its contents back to the memory, if it's ever attempted to be accessed.

Is that what you have been assuming? Or why don't I see how this works?
Sane is offline   Reply With Quote
Old Mar 18th, 2007, 7:30 PM   #7
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by Sane View Post
Edit :[indent]Nevermind. You can probably disregard the original post. So, if I understand this correctly, the shelve module does not increase RAM for the number of objects being stored? It stores them on the hard drive, but my program will treat them as traditional variables?
Yes and no. The Shelf module uses a dbm database to store objects serialized by the pickle module. When you request an object, the Shelf class queries the database, extracts the string of data that represents the object, and deserializes it (or "unpickles" it) into a real object. The Shelf class does have the option of having an in-memory cache, but this cache is essentially unlimited in size, so probably not what you want; fortunately, it's off by default.

To show you what I mean:
import shelve

shelf = shelve.open("somefile.dbm")

# "dog" is read from disk and made into an object, then printed out. Because
# the object has not been assigned a name, it is destroyed the moment the
# print command ends.
print shelf["dog"]

# This line will read the same data in again, and store it only briefly in
# memory before discarding it again.
print shelf["dog"]

# "cat" is read from disk, but this time it is assigned a reference. This means
# that the "cat" object will persist in memory.
cat = shelf["cat"]
print cat

# This command doesn't reread the data from disk - it uses the in-memory
# version of "cat"
print cat

# Now, we can wait for "cat" to fall out of scope, or we can delete it
# manually:
del cat

# No more cat instances exist now
Arevos is offline   Reply With Quote
Old Mar 18th, 2007, 11:28 PM   #8
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 1,835
Rep Power: 5 Sane will become famous soon enough
Send a message via MSN to Sane
Okay, everything seems to be working fine, except python.exe doesn't seem to lower in memory usage when something is shelved. When I delete it form the RAM, it lowers in memory, but then it goes straight up again once it's shelved.


I ran the following commands sequentially in the Python command line, while watching the memory:

>>> class x:
...     def __init__(self):
...         self.mem = "-" * 1024*100

>>> shelf = shelve.open("test_shelve.dbm")
>>> shelf["a"] = x()

And the memory went waaay up. Shouldn't it only go up a little bit, since shelve won't keep it in the memory?

By the way, this is my current solution, for which it works, but does not lower memory:

class main:

    def __init__(self):

        self.shelf = shelve.open("link_db.dbm")

...

        self.last_db_dump  = self.last_save
        self.db_dump_every = 60
        self.db_dump_age   = 30
        
        self.grab("queue", list)
        self.grab("active", list)
        self.grab("max_link_id", int)

        self.links  = dict()
        self.db_fetch_times = dict()
            
...

    def fetch_link(self, link_id):

        date = int(time.time())
        self.db_fetch_times[link_id] = date

        if not self.links.has_key(link_id):
            # load
            print "Loaded :", link_id
            self.links[link_id] = self.shelf["instance_%s"%(link_id)]

        # do a routine check for unused RAM
        if date >= self.last_db_dump + self.db_dump_every:
            for dump_link_id in self.db_fetch_times.keys():
                if date >= self.db_fetch_times[dump_link_id] + self.db_dump_age:
                    # save
                    self.shelf["instance_%s"%(dump_link_id)] = self.links[dump_link_id]
                    del self.links[dump_link_id]
                    del self.db_fetch_times[dump_link_id]
                    print "Saved :", dump_link_id
            self.last_db_dump = date
            
        return self.links[link_id]
              
    def grab(self, var_name, var_type):
        
        try:
            setattr(self, var_name, self.shelf[var_name])
        except KeyError:
            setattr(self, var_name, var_type())

...

    def add_to_queue(self, link_id):

        link = self.fetch_link(link_id)
        if link.points > 0:
            self.queue.append(link_id)
            link.points -= 1

...

Last edited by Sane; Mar 18th, 2007 at 11:42 PM.
Sane is offline   Reply With Quote
Old Mar 19th, 2007, 7:58 AM   #9
ZenMasterJG
Hobbyist Programmer
 
ZenMasterJG's Avatar
 
Join Date: Nov 2004
Location: Boston, MA
Posts: 148
Rep Power: 4 ZenMasterJG is on a distinguished road
Send a message via AIM to ZenMasterJG
Sane:
IMHO, you're optimizing prematurely. DaWei is right. Find out how your algorithm does, and *then* optimize. If your RAM can't handle the number of objects your creating, trying to think of an alternative solution is probably a better idea then trying to re-write Python's memory management.
ZenMasterJG is offline   Reply With Quote
Old Mar 19th, 2007, 8:19 AM   #10
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by ZenMasterJG View Post
IMHO, you're optimizing prematurely. DaWei is right. Find out how your algorithm does, and *then* optimize. If your RAM can't handle the number of objects your creating, trying to think of an alternative solution is probably a better idea then trying to re-write Python's memory management.
Normally, I'd agree, but if the number of objects in memory plainly exceeds the amount of RAM available, then this sort of optimization is necessary. However, RAM tends to be fairly large these days, so one might want to consider whether it really is necessary; but in principle, at least, this might not fall under the category of premature optimization, but more a matter of necessity.

Quote:
Originally Posted by Sane View Post
And the memory went waaay up. Shouldn't it only go up a little bit, since shelve won't keep it in the memory?
Whilst Python discards memory instantly, your OS usually does not. If a chunk of memory is freed by a program, and you have plenty of free RAM around, then the OS will often keep the chunk of memory around until it's needed.

In order to properly check that the application works, you need to operate on many different objects. For instance:
x = "-" * 100 * 1024
del x    # x deleted, but OS doesn't bother to reassign 100M straight away

# The following code will likely result in an out of memory error:
l = []
for i in range(100):
    l.append("-" * 100 * 1024)

# The following code will not, since the OS will reassign the freed bits of
# memory when it deems it necessary:
for i in range(100):
    x = "-" * 100 * 1024
Try your class with a large number of objects that would normally result in an out-of-memory error if they were all loaded in at once.
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hacking the Python compiler Arevos Python 5 Jan 28th, 2006 4:24 AM
[tutorial] Python for programming beginners coldDeath Python 30 Dec 14th, 2005 11:35 AM
Advanced Python Tricks Arevos Python 19 Sep 24th, 2005 7:39 AM
Python - A Programmers Introduction coldDeath Python 17 Aug 19th, 2005 12:41 PM
Pointers in C (Part I) Stack Overflow C 4 Apr 28th, 2005 7:03 PM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 6:20 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC