Schedulers matter
March 04, 2009 at 10:26 AM | categories: python, oldblog | View Comments
Improving the scheduler. It's been something we've avoided for quite a while in Kamaelia, but Simon Wittber's recent post benchmarking Fibra vs Kamaelia vs stackless is interesting. His key showing that "Stackless is 7x faster than Fibra, and Fibra is 10x faster than Kamaelia" are cool for him :-) ... and naturally led me to think "why". Looking at the code, it struck me that he's doing something more interesting with the scheduler given these code forms:
Beyond all that though, fibra looks neat :)
scheduler.install(self.messageLoop())If you delve inside the fibra scheduler (which doesn't appear to be here unexpectedly) you see the following core:
# self.MessageLoop is a regular python generator
...
yield self.incrementCounter()
yield kickTo.messageQueue.push(self)
def tick(self):The core of this appears to be "when I'm done, do this later" next. If you think that's familiar, it should be - its very similar (at least conceptually) to what twisted does with deferreds. It's not identical, but similar. So what does this mean for the hacksack demo? Well, if we look at self.tasks after each run, by changing:
"""Iterates the scheduler, running all tasks and calling all
handlers.
"""
active = False
for handler in self.handlers:
active = handler.is_waiting() or active
active = (len(self.tasks) > 0) or active
tasks = []
while True:
try:
task, send_value = self.tasks.pop(0)
except IndexError, e:
break
try:
if isinstance(send_value, Exception):
r = task.throw(send_value)
else:
r = task.send(send_value)
except StopIteration, e:
r = e
if r is None:
tasks.append((task, None))
else:
try:
handler = self.type_handlers[type(r)]
except KeyError:
raise RuntimeError("Don't know what to do with yielded type: %s" % type(r))
handler.handle(r, task)
self.tasks[:] = tasks
return active
def run(self):to:
while self.tick():
pass
def run(self):It becomes apparent what's happening (though it's fairly obvious from above):
while self.tick():
print "TASKS", [x[0] for x in self.tasks]
TASKS [<generator object at 0xb7b04fcc>]Fundamentally, the reason it's quicker for two reasons:
TASKS []
TASKS [<generator object at 0xb780f94c>]
TASKS []
TASKS [<generator object at 0xb7a1b04c>]
TASKS []
TASKS [<generator object at 0xb776fcac>]
TASKS []
TASKS [<generator object at 0xb79327ac>]
TASKS []
TASKS [<generator object at 0xb7b0ff2c>]
TASKS []
- It always knows which generator is ready to run next.
- It also defaults to pausing the generator, unless it specifically asks to be run. ie the tasks default to being passive.
- The sender knows who the receiver is. This allows the sender to schedule the receiver explicitly.
- Is effectively a round robin scheduler - essentially for simplicity. The key question we wanted to ask was "does a component model based around inboxes/outboxes make it easier to make easier to write/reuse concurrent software", rather than "how can we build a faster, more responsive scheduler". As a result the round robin scheduler was always a compromise. There's a little bit of intelligence in there, but not much.
- Kamaelia's components default to active, rather than passive. That is they're expected to run continuously unless explicitly paused. This design decision impacts the scheduler as a whole.
- The sender does not know who the receiver is. This requires something different to happen in the scheduler. On the upside, it means that code can be tested easier in isolation, or reused in situations it wasn't originally expected to be used within.
Beyond all that though, fibra looks neat :)