![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Programming Guru
![]() |
Hacking Python Memory
This might sound a little over the edge, but bare with me.
I have a Python program that at any given time could be storing huge amounts of memory in the RAM. However, most of the time only 1/8ths of that memory may actually be useful. The obvious choice is keeping this stored in a mysql database so that mysql can decide for itself what's important enough to cache, and what can be stored on the hard drive. However, I don't want to go that route, since the memory that will be accessed must be accessed as quickly as possible, and very frequently. MySQL is adequately fast, but retrieving large amounts of different information is too frequent to be fast enough. Therefore, my idea was to create another layer overtop of the memory, and underneath the script execution. When a block of memory hasn't been accessed for a while, the layer will store the block of memory in a new file, and delete that portion from the RAM. If the script attempts to access that memory, the layer will retrieve the file's contents, delete the file, and store it in the memory again. This could be very easy to do, or very difficult, depending on what Python has to offer in these regards. Does Python support lookups of memory address locations? Are there any existing libraries that can help? The memory that's of interest is a list of class instances. Each class instance is storing several strings, integers and more lists. My first thought is to solve this using Python's decorators, by adding a decorator to every function that will look at these class instances each time the function is called. If a variable is set to None or False, then its corresponding id(var) will have a file with its contents stored in it. Only problem is I don't believe that will lighten the load on the RAM, and that's a big problem. Any help, advice, or food for thought will be very helpful. Thanks in advance. |
|
|
|
|
|
#2 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
If you're using CPython (the official Python interpreter), then this is relatively simple. CPython uses a reference counting memory management system, which means that the number of references to an object is kept track of, and when this reaches zero, the object is instantly destroyed. This is a very simplistic and somewhat inefficient approach to garbage collection, as it's usually better to dereference a whole block of memory all at once, since freeing memory takes time; however there are advantages to reference counting. For instance, in CPython, you can do something like this:
lines = open("file.txt").readlines()Essentially, you could create a wrapper class that keeps objects on disk until they are needed, and then expires them after a certain amount of time (perhaps using the "shelve" module as storage). To expire an object, just remove all references to it. You may want to use the weakref module to make sure you don't give out any "real" references that might prevent your objects from being recycled. You could also use the __getattr__ method so that you can access your data like this: diskcache.commonvar = 10 # gets from in-memory cache (a dict) diskcache.rarevar += 10 # gets from disk (via shelve), stores in memory cache ... # more stuff # rarevar hasn't been access for some time, # so it's removed when diskcache is accessed again: diskcache.x += 1 # rarevar removed as x is returned |
|
|
|
|
|
#3 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
This is what a reasonably decent operating system tries to do for you, with its cache/virtual memory. Have you determined by performance measurements that it's really necessary?
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#4 |
|
Programming Guru
![]() |
@Arevos : That all seems pretty straight forward. But how do I get the memory back when it's attempted to be referenced again? Maybe I don't see how this works.
@DaWei : I haven't yet done any measurements, but this is more because I anticipate that the list of class instances could potentially reach several hundred thousand instances. And my RAM can't possibly handle that cleanly. |
|
|
|
|
|
#5 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Well, if I'm understanding right, then you have a series of objects containing data. You could pickle those objects to a file, using something like the shelve module, and keep them in a time-limited cache whenever they're accessed. In order to access the objects, one must always go through the cache.
|
|
|
|
|
|
#6 |
|
Programming Guru
![]() |
Edit :
Nevermind. You can probably disregard the original post. So, if I understand this correctly, the shelve module does not increase RAM for the number of objects being stored? It stores them on the hard drive, but my program will treat them as traditional variables?An off-topic question here: Is there a function that automatically dumps an instance's contents to a binary file, and then reads it right back in with all the types and attributes in tact? If not, I could just quickly write one. Original Post : I'm not sure if I'm missing something, or if you're missing something, but to make sure we're on the same page, I should probably clarify: |
|
|
|
|
|
#7 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
To show you what I mean: import shelve
shelf = shelve.open("somefile.dbm")
# "dog" is read from disk and made into an object, then printed out. Because
# the object has not been assigned a name, it is destroyed the moment the
# print command ends.
print shelf["dog"]
# This line will read the same data in again, and store it only briefly in
# memory before discarding it again.
print shelf["dog"]
# "cat" is read from disk, but this time it is assigned a reference. This means
# that the "cat" object will persist in memory.
cat = shelf["cat"]
print cat
# This command doesn't reread the data from disk - it uses the in-memory
# version of "cat"
print cat
# Now, we can wait for "cat" to fall out of scope, or we can delete it
# manually:
del cat
# No more cat instances exist now |
|
|
|
|
|
|
#8 |
|
Programming Guru
![]() |
Okay, everything seems to be working fine, except python.exe doesn't seem to lower in memory usage when something is shelved. When I delete it form the RAM, it lowers in memory, but then it goes straight up again once it's shelved.
I ran the following commands sequentially in the Python command line, while watching the memory: >>> class x:
... def __init__(self):
... self.mem = "-" * 1024*100
>>> shelf = shelve.open("test_shelve.dbm")
>>> shelf["a"] = x()And the memory went waaay up. Shouldn't it only go up a little bit, since shelve won't keep it in the memory? By the way, this is my current solution, for which it works, but does not lower memory: class main:
def __init__(self):
self.shelf = shelve.open("link_db.dbm")
...
self.last_db_dump = self.last_save
self.db_dump_every = 60
self.db_dump_age = 30
self.grab("queue", list)
self.grab("active", list)
self.grab("max_link_id", int)
self.links = dict()
self.db_fetch_times = dict()
...
def fetch_link(self, link_id):
date = int(time.time())
self.db_fetch_times[link_id] = date
if not self.links.has_key(link_id):
# load
print "Loaded :", link_id
self.links[link_id] = self.shelf["instance_%s"%(link_id)]
# do a routine check for unused RAM
if date >= self.last_db_dump + self.db_dump_every:
for dump_link_id in self.db_fetch_times.keys():
if date >= self.db_fetch_times[dump_link_id] + self.db_dump_age:
# save
self.shelf["instance_%s"%(dump_link_id)] = self.links[dump_link_id]
del self.links[dump_link_id]
del self.db_fetch_times[dump_link_id]
print "Saved :", dump_link_id
self.last_db_dump = date
return self.links[link_id]
def grab(self, var_name, var_type):
try:
setattr(self, var_name, self.shelf[var_name])
except KeyError:
setattr(self, var_name, var_type())
...
def add_to_queue(self, link_id):
link = self.fetch_link(link_id)
if link.points > 0:
self.queue.append(link_id)
link.points -= 1
...
__________________
Waterloo's Canadian Computing Competition (CCC) - Stage 2 Problems, Solutions, and Test Data Last edited by Sane; Mar 18th, 2007 at 11:42 PM. |
|
|
|
|
|
#9 |
|
Hobbyist Programmer
|
Sane:
IMHO, you're optimizing prematurely. DaWei is right. Find out how your algorithm does, and *then* optimize. If your RAM can't handle the number of objects your creating, trying to think of an alternative solution is probably a better idea then trying to re-write Python's memory management. |
|
|
|
|
|
#10 | ||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Quote:
In order to properly check that the application works, you need to operate on many different objects. For instance: x = "-" * 100 * 1024
del x # x deleted, but OS doesn't bother to reassign 100M straight away
# The following code will likely result in an out of memory error:
l = []
for i in range(100):
l.append("-" * 100 * 1024)
# The following code will not, since the OS will reassign the freed bits of
# memory when it deems it necessary:
for i in range(100):
x = "-" * 100 * 1024 |
||
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Hacking the Python compiler | Arevos | Python | 5 | Jan 28th, 2006 4:24 AM |
| [tutorial] Python for programming beginners | coldDeath | Python | 30 | Dec 14th, 2005 11:35 AM |
| Advanced Python Tricks | Arevos | Python | 19 | Sep 24th, 2005 7:39 AM |
| Python - A Programmers Introduction | coldDeath | Python | 17 | Aug 19th, 2005 12:41 PM |
| Pointers in C (Part I) | Stack Overflow | C | 4 | Apr 28th, 2005 7:03 PM |