Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jul 10th, 2006, 6:28 PM   #11
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by titaniumdecoy
By the way, what does the zip function do?
It mixes two sequences together.
>>> zip(a, b, c...)
[(a[0], b[0], c[0] ...), (a[1], b[1], c[1] ...), ...]
For instance:
>>> zip([1, 2, 3], ['a', 'b', 'c'])
[(1, 'a'), (2, 'b'), (3, 'c')]
Incidentally, itertools.izip does the same thing, but as a generator.
Arevos is offline   Reply With Quote
Old Jul 10th, 2006, 8:03 PM   #12
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Quote:
a = sum(x * y for x, y in zip(flatten(grid), flatten(map)))
This is nice. The reason I asked about flattening initially is that part of the process will be more easily worked as lists of lists, and part as single lists.

I was working the same approach, but with a generator; not nearly as clean, at least to this point. Thanks for your contributions. The exercise here was not to get the job done, but to develop a Pythonic mental slant on problem solving. Older solutions tend to invade and beg for mere transalation. I'm not sure what the relative efficiencies are, but I intend to apply this transform over thousands to millions of items.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Jul 10th, 2006, 8:23 PM   #13
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by DaWei
I'm not sure what the relative efficiencies are, but I intend to apply this transform over thousands to millions of items.
Might be best to use a generator then:
def xflatten(seq):
    for x in seq:
        if type(x) is list:
            for y in flatten(x):
                yield y
        else:
            yield x
The itertools module provides a generator for zip:
from itertools import izip

a = sum(x * y for x, y in izip(xflatten(grid), xflatten(map)))
In terms of efficiencies, both the original flatten, and the inbuilt function zip, create new lists as output. If grid and map are very large - well, the problems are obvious.

As an aside, range is not a good function to use for long for-loops, because it constructs a new list each time it's called. The xrange function doesn't have this problem, using a generator approach instead.
Arevos is offline   Reply With Quote
Old Jul 10th, 2006, 8:28 PM   #14
titaniumdecoy
Expert Programmer
 
titaniumdecoy's Avatar
 
Join Date: Nov 2005
Posts: 855
Rep Power: 3 titaniumdecoy is on a distinguished road
Send a message via AIM to titaniumdecoy
Quote:
Originally Posted by Arevos
Mm, that thought had crossed my mind, too. However, Numeric arrays are limited to numerical data, and there's no "flatten" capability, so it's not particularly any better than using lists.
A little off-topic, but you can flatten Numeric arrays with the ravel function.
titaniumdecoy is offline   Reply With Quote
Old Jul 10th, 2006, 8:34 PM   #15
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by titaniumdecoy
A little off-topic, but you can flatten Numeric arrays with the ravel function.
Hm. That's inetresting. Especially since DaWei appears to be dealing with large amounts of numerical data, which is one of Numeric's specialities.

On the other hand, ravel returns an array, which has obvious disadvantages over a generator-based approach for dealing with large volumes of data. Do you know if there's a generator in the Numeric module that does the equivalent to ravel?
Arevos is offline   Reply With Quote
Old Jul 10th, 2006, 11:38 PM   #16
titaniumdecoy
Expert Programmer
 
titaniumdecoy's Avatar
 
Join Date: Nov 2005
Posts: 855
Rep Power: 3 titaniumdecoy is on a distinguished road
Send a message via AIM to titaniumdecoy
Quote:
a = sum(x * y for x, y in zip(flatten(grid), flatten(map)))
I still don't see what's more "Pythonic" about this approach: It's harder to read, requires more lines of code, involves multiple function calls (sum, zip, flatten, reduce, etc.) as well as the definition of an additional function (flatten), and in all likelihood it's probably slower than the original. So... what advantage does this method offer?

Quote:
Originally Posted by Arevos
Do you know if there's a generator in the Numeric module that does the equivalent to ravel?
I have no idea.
titaniumdecoy is offline   Reply With Quote
Old Jul 11th, 2006, 6:38 AM   #17
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by titaniumdecoy
I still don't see what's more "Pythonic" about this approach: It's harder to read, requires more lines of code, involves multiple function calls (sum, zip, flatten, reduce, etc.) as well as the definition of an additional function (flatten), and in all likelihood it's probably slower than the original. So... what advantage does this method offer?
Well, I guess one could always use the original method I suggested:
a = sum(grid[i][j] * map[i][j] for j in range(3) for i in range(3))
Which isn't going to be any slower than a direct translation, and not particularly harder to read. Indeed, the presence of the function "sum" makes it more obvious to me what the code is doing.

Or you could abstract the zips and flattens into a further function, for something more readable:
def combine(seqs):
    return zip(map(flatten, seqs))

a = sum(x * y for x, y in combine(grid, mymap))
There may be a better name than "combine", but you get the idea. As for efficiency, there shouldn't be too much difference if you use generators, though accessing the elements through indicies (either via a generator or a for loop) is likely to go slightly faster.

That said, once you have the flatten and combine functions, you can apply them to other parts of your program. Even factor them off into a library of their own. So saying they take up more lines assumes that they're only going to be used once

Just to satisfy my curiousity, however, I'll run some efficiency tests on the various methods and report back on my findings.
Arevos is offline   Reply With Quote
Old Jul 11th, 2006, 8:06 AM   #18
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Here's the test code I used:
from itertools import izip

def xflatten(seq):
    for x in seq:
        if type(x) is list:
            for y in xflatten(x):
                yield y
        else:
            yield x

def flatten_test((a, b)):
    return sum(x * y for x, y in izip(xflatten(a), xflatten(b)))

def comprehension_test((a, b)):
    return sum(a[i][j] * b[i][j] for j in xrange(len(a))
               for i in xrange(len(a)))

def traditional_test((a, b)):
    sum = 0
    for i in xrange(len(a)):
        for j in xrange(len(b)):
            sum += a[i][j] * b[i][j]
    return sum

And here are the results for the time taken to parse two 1000x1000 lists (in seconds).
traditional_test: 0.463695049286
comprehension_test: 0.620153903961
flatten_test: 1.59923195839
The results rather surprised me. I would have considered using the inbuilt function sum to be faster than adding up all the numbers with +=, and generators shouldn't be any slower than the equivalent for loop, as they're essentially the same thing.

However, the results didn't change when I ran them again, so it appears as if using generators and the sum function is around 50% slower than doing it via for loops. I can't hazard a guess where the extra CPU time is vanishing to. Most curious.

The flatten_test performed worse than I'd have thought, but considering the number of functions involved, a 300% slowdown is not unduly expected.

For kicks, I introduced psyco into the mix. Predictably, it worked better with the most basic approach:
traditional_test: 0.0489661693573
comprehension_test: 0.831726789474
flatten_test: 2.478703022
Curiously, the more complex tests actually performed worse than normal. The for loop approach, however, had a 1000% improvement in efficiency. Again, predictable, but not too shabby.

I tried comparing it against a C program that did the same thing, and I got:
c_test: 0.026326
Which is only twice as fast as the psyco-enhanced traditional_test. Considering that arrays in C are essentially just direct memory access, whilst lists in Python are considerably more complex, this is pretty good.
Arevos is offline   Reply With Quote
Old Jul 11th, 2006, 8:27 AM   #19
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
As you probably picked up on, I'm writing this for the Python and not for the functionality. I probably picked the wrong old code for the purpose. I wrote an image analyzer, in C++, a couple of years ago. I wrote all my own classes even though there were some available in the Windows API. For this exercise, I'm using Python and wxWidgets. Not all wxWidget functionality is available in the Python port. For one thing, not all image constructors are available, and for another, there is no direct access to the image data. Access to the data is via sub image, which fits well with the use of a Sobel grid. The result of the gradient approximation cannot be stuffed directly into an output image, however, but requires construction from a flattened collection of pixels. This also requires conversion between character/numeric values for the pixels. It is indeed slow with the image access mixed in with the processing. At this point I'm considering writing my own image class in order to gain direct access to the data. This thread is very helpful and interesting. I'll have to look up psyco.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Jul 11th, 2006, 9:38 AM   #20
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by DaWei
I'll have to look up psyco.
Psyco is essentially a specializing JIT compiler for Python code. It's limited to x86 architectures, and has some memory overhead, but can result in large efficiency gains in certain types of algorithms. One could make it optional like so:
try:
    from psyco import proxy as psyco
except ImportError:
    def psyco(func): return func

@psyco
def foobar():
    ...
On x86 architectures with psyco installed, foobar would be optimised. Otherwise, nothing would be changed.
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 7:39 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC