Benchmarking - Kamaelia vs Stackless
November 21, 2007 at 11:48 PM | categories: python, oldblog | View Comments
Interesting post by rhonabwy on comparing Kamaelia to Stackless. The benchmark figures there are pretty grim from my perspective, but a useful starting point:
Now I don't actually have that much memory on my machine so going above this causes my machine to start swapping, but even factoring that in, this is the next level up:
However, despite all this, after this small optimisation, the scaling properties are significantly better, and to my mind rather importantly much more in line with the scaling properties that stackless exhibits. A key point from this, using the optimisation I made it implies that we DO have better scaling properties using generators than with threads. Woo :-)
More seriously, this does imply that a natural way to optimise Kamaelia systems could be to simply create a new base class Axon.TaskletComponent.taskletcomponent that uses channels to implement inboxes/outboxes and scale out that way. It'd mean that one could take the bulk of their code with them and make small changes to gain further optimisations.
(aside: I'm a bit irritated by the memory consumption though, I know I don't have a huge amount of memory and I was running other applications on the system, but I did expect better. I'll have to look into that I think. When I get a round tuit. )
Overall though, despite not performing as well as stackless (which I did really expect, understanding the changes it makes) I am very pleased wth the scaling properties beng similar :-)
My full version of the code:
The grim part of course is the scaling aspect here. Handily though, rhonabwy post the code as well. I noticed that a key change we often make in mature code - to pause - was missing, so I changed the main loop slightly from this:10 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 127 msec per loop
100 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 587 msec per loop
1000 concurrent objects, 1000 loops
10000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 6.05 sec per loop
python timeit.py -s "import hackysack" "hackysack.runit(10000,1000)"
10 loops, best of 3: 60.4 sec per loop
def main(self):To this:
yield 1
while 1:
if self.dataReady('inbox'):
def main(self):Green is new code, blue indicates a change. So, how does this perform? Well, first of all, running the unchanged code I get this:
yield 1
while 1:
while not self.anyReady():
self.pause()
yield 1
while self.dataReady('inbox'): # Note change from "if"
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"I lost patience at that point. It does show the same bad scaling properties though. So I then added n the changes above and reran it. This is what I got:
10 loops, best of 3: 182 msec per loop
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 820 msec per loop
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 9.23 sec per loop
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 206 msec per loop
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 267 msec per loop
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 816 msec per loop
Now I don't actually have that much memory on my machine so going above this causes my machine to start swapping, but even factoring that in, this is the next level up:
~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(5000,1000)"Now, clearly this still isn't as good as stackless - which isn't surprising, Stackless removes a whole layer of stack frame shenanigans from the entire system and it also implements its scheduler & channel handling in C. I also don't know how well the Stackless scheduler is optimised. Probably better than ours - ours implements a very simple round robin scheduler.
10 loops, best of 3: 2.77 sec per loop
However, despite all this, after this small optimisation, the scaling properties are significantly better, and to my mind rather importantly much more in line with the scaling properties that stackless exhibits. A key point from this, using the optimisation I made it implies that we DO have better scaling properties using generators than with threads. Woo :-)
More seriously, this does imply that a natural way to optimise Kamaelia systems could be to simply create a new base class Axon.TaskletComponent.taskletcomponent that uses channels to implement inboxes/outboxes and scale out that way. It'd mean that one could take the bulk of their code with them and make small changes to gain further optimisations.
(aside: I'm a bit irritated by the memory consumption though, I know I don't have a huge amount of memory and I was running other applications on the system, but I did expect better. I'll have to look into that I think. When I get a round tuit. )
Overall though, despite not performing as well as stackless (which I did really expect, understanding the changes it makes) I am very pleased wth the scaling properties beng similar :-)
My full version of the code:
#!/usr/bin/python
import Axon
import time
import random
import sys
class hackymsg:
def __init__(self,name):
self.name = name
class counter:
def __init__(self):
self.count = 0
def inc(self):
self.count +=1
class hackysacker(Axon.Component.component):
def __init__(self,name,circle,cntr,loops):
Axon.Component.component.__init__(self)
self.cntr = cntr
self.name = name
self.loops = loops # terminating condition
self.circle = circle # a list of all the hackysackers
circle.append(self)
def main(self):
yield 1
while 1:
while not self.anyReady():
self.pause()
yield 1
while self.dataReady('inbox'):
msg = self.recv('inbox')
if msg == 'exit':
return
if self.cntr.count > self.loops:
for z in self.circle:
z.inboxes['inbox'].append('exit')
return
#print "%s got hackysack from %s" % (self.name, msg.name)
kickto = self.circle[random.randint(0,len(self.circle)-1)]
while kickto is self:
kickto = self.circle[random.randint(0,len(self.circle)-1)]
#print "%s kicking hackysack to %s" %(self.name, kickto.name)
msg = hackymsg(self.name)
kickto.inboxes['inbox'].append(msg)
self.cntr.inc()
#print self.cntr.count
yield 1
def runit(num_hackysackers=5,loops=100):
cntr = counter()
circle=[]
first_hackysacker = hackysacker('1',circle,cntr,loops)
first_hackysacker.activate()
for i in range(num_hackysackers):
foo = hackysacker(`i`,circle,cntr,loops)
foo.activate()
# throw in the first sack...
msg = hackymsg('me')
first_hackysacker.inboxes['inbox'].append(msg)
Axon.Component.scheduler.run.runThreads()
if __name__ == "__main__":
runit(num_hackysackers=1000,loops=1000)