![]() |
Hacking Python Memory
This might sound a little over the edge, but bare with me.
I have a Python program that at any given time could be storing huge amounts of memory in the RAM. However, most of the time only 1/8ths of that memory may actually be useful. The obvious choice is keeping this stored in a mysql database so that mysql can decide for itself what's important enough to cache, and what can be stored on the hard drive. However, I don't want to go that route, since the memory that will be accessed must be accessed as quickly as possible, and very frequently. MySQL is adequately fast, but retrieving large amounts of different information is too frequent to be fast enough. Therefore, my idea was to create another layer overtop of the memory, and underneath the script execution. When a block of memory hasn't been accessed for a while, the layer will store the block of memory in a new file, and delete that portion from the RAM. If the script attempts to access that memory, the layer will retrieve the file's contents, delete the file, and store it in the memory again. This could be very easy to do, or very difficult, depending on what Python has to offer in these regards. Does Python support lookups of memory address locations? Are there any existing libraries that can help? The memory that's of interest is a list of class instances. Each class instance is storing several strings, integers and more lists. My first thought is to solve this using Python's decorators, by adding a decorator to every function that will look at these class instances each time the function is called. If a variable is set to None or False, then its corresponding id(var) will have a file with its contents stored in it. Only problem is I don't believe that will lighten the load on the RAM, and that's a big problem. Any help, advice, or food for thought will be very helpful. Thanks in advance. |
If you're using CPython (the official Python interpreter), then this is relatively simple. CPython uses a reference counting memory management system, which means that the number of references to an object is kept track of, and when this reaches zero, the object is instantly destroyed. This is a very simplistic and somewhat inefficient approach to garbage collection, as it's usually better to dereference a whole block of memory all at once, since freeing memory takes time; however there are advantages to reference counting. For instance, in CPython, you can do something like this:
:
lines = open("file.txt").readlines()Essentially, you could create a wrapper class that keeps objects on disk until they are needed, and then expires them after a certain amount of time (perhaps using the "shelve" module as storage). To expire an object, just remove all references to it. You may want to use the weakref module to make sure you don't give out any "real" references that might prevent your objects from being recycled. You could also use the __getattr__ method so that you can access your data like this: :
diskcache.commonvar = 10 # gets from in-memory cache (a dict) |
This is what a reasonably decent operating system tries to do for you, with its cache/virtual memory. Have you determined by performance measurements that it's really necessary?
|
@Arevos : That all seems pretty straight forward. But how do I get the memory back when it's attempted to be referenced again? Maybe I don't see how this works.
@DaWei : I haven't yet done any measurements, but this is more because I anticipate that the list of class instances could potentially reach several hundred thousand instances. And my RAM can't possibly handle that cleanly. |
Quote:
|
Edit :
Nevermind. You can probably disregard the original post. So, if I understand this correctly, the shelve module does not increase RAM for the number of objects being stored? It stores them on the hard drive, but my program will treat them as traditional variables?An off-topic question here: Is there a function that automatically dumps an instance's contents to a binary file, and then reads it right back in with all the types and attributes in tact? If not, I could just quickly write one. Original Post : I'm not sure if I'm missing something, or if you're missing something, but to make sure we're on the same page, I should probably clarify: |
Quote:
To show you what I mean: :
import shelve |
Okay, everything seems to be working fine, except python.exe doesn't seem to lower in memory usage when something is shelved. When I delete it form the RAM, it lowers in memory, but then it goes straight up again once it's shelved.
I ran the following commands sequentially in the Python command line, while watching the memory: :
>>> class x:And the memory went waaay up. Shouldn't it only go up a little bit, since shelve won't keep it in the memory? By the way, this is my current solution, for which it works, but does not lower memory: :
class main: |
Sane:
IMHO, you're optimizing prematurely. DaWei is right. Find out how your algorithm does, and *then* optimize. If your RAM can't handle the number of objects your creating, trying to think of an alternative solution is probably a better idea then trying to re-write Python's memory management. |
Quote:
Quote:
In order to properly check that the application works, you need to operate on many different objects. For instance: :
x = "-" * 100 * 1024 |
| All times are GMT -5. The time now is 3:17 PM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC