Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Python (http://www.programmingforums.org/forum43.html)
-   -   Python, Performance, and Psyco (http://www.programmingforums.org/showthread.php?t=10707)

DaWei Jul 13th, 2006 11:11 AM

Python, Performance, and Psyco
 
First of all, if you're new to programming and the real world, you almost certainly have the wrong ideas about application development, performance tuning, efficient coding, and the whys and wherefores thereof. That isn't intended to be a mortal insult, but the probability of it being true is very high. I would recommend that you visit this link. The paths that you choose to think-walk might be more productive.

As an example, I am currently messing around with porting some image-analysis code to Python (from C/C++). On my first shot, it took 5-10 seconds to load the file and 10-15 seconds to process it. Those relative times have no bearing on where performance issues need to be addressed. The file loading process is: pick the file menu; pick the open option; pick the directory; pick the file; pick my nose; pick OK. Honing those actions, especially picking my nose, is not the path to success. The file is picked once. The image is processed with a function FOR EACH PIXEL. That process may involve accessing multiple pixels for each pixel. Obviously, it can get out of hand. When you have an idea that some section of code is critical, you may be right. Find out. Profile the code. Here is a sample profile (made AFTER I cut the time in half by switching image libraries).
:

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.691    1.691  11.567  11.567 Filter.py:332(SobelEdge3)
    15876    1.562    0.000    2.268    0.000 Filter.py:38(gradient)
    15876    1.035    0.000    2.978    0.000 Image.py:670(crop)
    16384    0.775    0.000    2.009    0.000 ImageDraw.py:133(_getink)
    16384    0.747    0.000    3.138    0.000 ImageDraw.py:226(point)
    15880    0.732    0.000    0.943    0.000 ImageFile.py:115(load)
    15876    0.701    0.000    1.001    0.000 Image.py:1546(__init__)
    15876    0.546    0.000    0.546    0.000 :0(crop)
    16390    0.524    0.000    0.923    0.000 Image.py:79(isStringType)
    15876    0.488    0.000    1.490    0.000 Image.py:793(getdata)
    15876    0.456    0.000    1.002    0.000 Image.py:1563(load)
    15879    0.407    0.000    0.407    0.000 :0(range)
    32788    0.399    0.000    0.399    0.000 :0(isinstance)
    16384    0.381    0.000    0.381    0.000 :0(draw_points)
    16385    0.312    0.000    0.312    0.000 :0(draw_ink)
    15894    0.301    0.000    0.301    0.000 Image.py:392(__init__)
    15876    0.300    0.000    0.300    0.000 :0(sqrt)
    15892    0.211    0.000    0.211    0.000 Image.py:525(load)

It is important to view these results as relative, particularly if you don't know your profiler intimately, or how your system may behave from time to time. The next profile is for processing the same image with the same code:
:

        1    1.398    1.398    9.612    9.612 Filter.py:363(SobelEdge)
    15876    1.394    0.000    1.961    0.000 Filter.py:56(gradient)
    15876    0.843    0.000    2.419    0.000 Image.py:670(crop)
    16384    0.685    0.000    1.765    0.000 ImageDraw.py:133(_getink)
    16384    0.625    0.000    2.736    0.000 ImageDraw.py:226(point)
    15883    0.594    0.000    0.774    0.000 ImageFile.py:115(load)
    15876    0.571    0.000    0.803    0.000 Image.py:1546(__init__)
    16390    0.468    0.000    0.841    0.000 Image.py:79(isStringType)
    15876    0.409    0.000    0.409    0.000 :0(crop)
    32794    0.373    0.000    0.373    0.000 :0(isinstance)
    15876    0.354    0.000    0.764    0.000 Image.py:1563(load)
    16384    0.346    0.000    0.346    0.000 :0(draw_points)
    15876    0.333    0.000    1.096    0.000 Image.py:793(getdata)
    15879    0.309    0.000    0.309    0.000 :0(range)
    15876    0.258    0.000    0.258    0.000 :0(sqrt)
    16385    0.240    0.000    0.240    0.000 :0(draw_ink)
    15900    0.232    0.000    0.232    0.000 Image.py:392(__init__)
    15904    0.180    0.000    0.180    0.000 Image.py:525(load)

Before I invested any time in optimization, I decided, on Arevos' recommendation, to apply a tool called Psyco. Rather than apply it to the program as a whole (who needs to optimize the nose-picking, remember?), I applied it specifically to the processing function (SobelEdge). This automatically applies it to all functions referenced by SobelEdge, also. The results are as follows:
:

        1    2.329    2.329    3.448    3.448 Filter.py:363(SobelEdge)
    15876    0.666    0.000    0.892    0.000 Image.py:1546(__init__)
    15900    0.227    0.000    0.227    0.000 Image.py:392(__init__)
    15904    0.225    0.000    0.225    0.000 Image.py:525(load)

Note from the lack of detail that it obviously generated an optimized proxy and ran that. The change is highly significant. Can you always expect such results? No. Why? Because your code, if it's nearly optimal, has less room for improvement. I decided to implement my own class for image manipulation. I still refer to the image library to get the data when I instantiate the class, and when I store the resulting, processed image. The profile for this new code is:
:

        1    0.985    0.985    3.332    3.332 Filter.py:363(SobelEdge)
    15876    1.164    0.000    1.692    0.000 Filter.py:56(gradient)
    15876    0.652    0.000    0.652    0.000 Filter.py:40(getMap)

The application of Psyco has the following result:
:

        1    0.863    0.863    2.515    2.515 Filter.py:363(SobelEdge)
    15876    0.896    0.000    0.896    0.000 Filter.py:40(getMap)
    15876    0.754    0.000    0.754    0.000 Filter.py:56(gradient)

While far less staggering, the result is still significant. You may appreciate that if the function were scaled up in order to process many images, in a batch fashion, things would be even more important. One would probably even resort to writing the procedure in a lower-level language and interfacing it with Python.

Arevos Jul 13th, 2006 12:34 PM

One of the most interesting posts I've read on these forums; bravo! I didn't even know Python had a profiler until I read this post and went looking for it.

In case others are as unfamiliar as I am with the profiler, the profile module is the one you want. You can call it directly for instant profiling of a .py file:
:

python -m profile foobar.py
And for those who use Ubuntu, the profiler isn't installed by default when you install Python. Instead, you have to install the python-profiler package separately:
:

sudo apt-get install python-profiler

DaWei Jul 13th, 2006 1:40 PM

Did you notice that Psyco halved the time of "gradient", but made "getMap" worse? I hand-tuned the hell out of "getMap" by reducing the calculations as much as possible (2*A = A+A kind of stuff) and performing the common operations as few times as possible. I used these to make a fixed range that pointed directly at the appropriate pixels (which aren't all contiguous). It isn't Pythonic-looking, but there you have it. I'll unbind SobelEdge and just bind "gradient" (or something similar).

Arevos Jul 13th, 2006 1:58 PM

Yep, you really have to take a targeted approach with Psyco. It can improve the execution time of functions dramatically... or it can slow them down. When I tested Psyco in the "How would you write this in Python?" thread, Psyco improved one function by a factor of ten, but slowed down the other functions by around 40%.

Game_Ender Jul 13th, 2006 2:31 PM

You also might want to check out hotshot, its the the python profiling module converted to c. Hotshot is part of the standard library. It has a less of an effect on the run time performance of your code so you can get some more accurate profiling data.

DaWei Jul 16th, 2006 7:16 PM

Just to reiterate: if you believe you need to wring performance out of your Python, profile. The 'count' method seems like an ideal way to derive a histogram from a set of image pixels. Using it (method 2, below) takes 500 times as long as using method 1 (which takes approximately 30 ms. for a 512x512 image). The 'list' function is necessary because the list returned by PIL doesn't have all the normal Python methods (for instance, count). It is, however, the count method, not the list function, that takes the time.
:

    def grabHistogram (self, image):
        #Method 1
        aList = [0 for i in range (256)]
        for j in image.getdata (): aList [j] += 1
        return aList
        #Method 2
        #return[list (image.getdata ()).count (i) for i in range (256)]

If you want to work with images, here are some observations. More built-in manipulations are available with PIL than with wxWidgets. wxWidgets seems to be more robust, in that image objects passed as arguments are apparently passed as references. PIL seems to pass object copies and also produces new image objects (rather than modifications) from the various manipulations. Some memory issues are implied by this approach. My impression is that PIL also produces memory leaks, which eventually cause a failure. The error messages produced do not state this, but tend to state that some set of data has an erroneous length. This occurs, eventually, for identical operations on identical images. I have yet to decide which particular library is to be preferred. I haven't tried any others.

Polyphemus_ Jul 16th, 2006 7:24 PM

Quote:

Originally Posted by DaWei
(2*A = A+A kind of stuff)

Just curious... wouldn't A = A << 1 be faster? Or is it unsupported by Python (doubt that myself)?

DaWei Jul 16th, 2006 7:41 PM

Just curious...are you stating a fact, putting forth a supposition, asking a question, or just nibbling at the thread? :confused:

Polyphemus_ Jul 17th, 2006 6:19 AM

Quote:

Originally Posted by DaWei
Just curious...are you stating a fact, putting forth a supposition, asking a question, or just nibbling at the thread? :confused:

I guess a combination of the 4.

DaWei Jul 17th, 2006 7:09 AM

Okay. At the most basic level of the machine, dealing directly in machine language, a left-shift would always be a more efficient way to multiply by two than taking an algorithmic approach. Whether or not it would beat addition would depend upon the precise implementation of the hardware elements. In any event, it would be a close race. Just whip up a schematic for a few bits worth of full-adders and a few bits worth of parallel-in, parallel-out, serial-in, serial-out shift register and you'll see what I mean.

If you want to talk about an abstract language, you're talking about something else. It isn't a horse of a different color, it's a mammal of different proportions. If the language is optimal, it will always go for the faster of the possible approaches without your knowledge or consideration. The implementors may or may not decide to interpret 2A as A+A, but they'll never decide to interpret additions as equivalent multiplications (we hope). You may, for instance, multiply 1.37509 by two, or add it to itself. You can't shift it left. I certainly hope you don't try to double your floats with that approach, even if they're round numbers.


All times are GMT -5. The time now is 12:55 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC