Sparks

The web is stateless, please

December 12, 2007 at 12:50 AM | categories: python, oldblog | View Comments

Arrgh. If I hit "delete" on a form, I'd like the thing I'm deleting to be encoded in the form rather than associated with an opaque sessionid which "helpfully" remembers what I last looked at. The web is meant to be stateless after all. kthxbye.

Read and Post Comments

Simple assistive short term memory

December 10, 2007 at 10:54 PM | categories: python, oldblog | View Comments

OK, here's a challenge. Build a tool to assist with short term memory loss. The idea is this:

You go to the top of the stairs to do something
You get there, and wonder what you were there for

It'd be neat if you could do this:

Wear a simple looking pedant, which was a short term memory assistive device
Simply mutter to yourself what you're doing
When you forget what you're doing you press/tap a button & hear a repeat of the last muttering

Press it again in rapid succession, and you get the one before, and so on. (giving you context "what was I doing before...")

Aesthetically, it would be nice if the device looked like a pedant or similar.
The playback/recording could be a simple earpiece/microphone (ala mobile phone handsfree).
It should be sufficiently low power to run happily for as much of a day or more as possible.

image (c) 2007 daniboi1977 (flickr)
(original license cc-nc-nd)

Technologically this sounds like it should be really quite simple & trivial to make today.

Now here's the thing - you could do this today with a notebook & pencil hanging from a necklace - except that you replace muttering with scribbling, and pressing a button with looking at the notebook. Or perhaps hanging a dictaphone round your neck and remembering to hit "record" every time you say something.
However, not everyone who might find such a thing useful (ie most of us when we get older) would be comfortable with the notebook on a necklace or dictapone on a necklace approach, but may find a short-term memory pendant assist socially acceptable.

So, the challenge is - could you make this today (almost certainly), and given that, how would you do so? This isn't entirely a thought experiment - people with alzheimers may find such a beast useful, and simply having to press a button to hear what you were doing and double tapping to hear what you were doing is probably something learnable. (maybe, dunno in that context really)

I suppose in a way it's a bit like an personal audio twitter with absolutely minimal interface with zero connectivity.

Incidentaly, imagine what you could use such a device for beyond this...

Read and Post Comments

Interesting - Transparent messages instead of modal "monolog" boxes

December 10, 2007 at 01:37 AM | categories: python, oldblog | View Comments

Wins bonus points for looking straight out of a film :-)
http://humanized.com/weblog/2006/09/11/monolog_boxes_and_transparent_messages/
Their point is somewhat less shallow than this, but it does remind of film interfaces... :-)

Read and Post Comments

Widefinder in Kamaelia?

December 09, 2007 at 05:55 PM | categories: python, oldblog | View Comments

Tim Bray's widefinder project is an interesting little problem. How do you find the top 10 most popular accesses to your website in a way that gains performance when you use more CPUs? Having done log analysis in the past, there's always a tradeoff Looking at the original Ruby source

counts = {}
counts.default = 0

ARGF.each_line do |line|
if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
counts[$1] += 1
end
end

keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
keys_by_count[0 .. 9].each do |key|
puts "#{counts[key]}: #{key}"
end

This really looks fairly Kamaelia friendly. It requires support not yet written ( :) ), but it certainly fits Kamaelia's model. (The code needing writing is relatively minimal now though, given we have basic multicore support)

So, what would this need to look like? Well, to my mind it'd look something like this: (pseudo kamaelia)

Pipeline(
    readfile,
    SplitWorkers(8, -> Create 8 Analyser workers, ideally in different CPUs
   Pipeline(Analyser -> Builds a heap for counts, emits heap
                          Serialiser -> outputs values in order from the heap
                 ) -> Output from a worker is a tuple (id, value)
    )
    merge(8) -> merge heap streams (Can't start until it has values from all workers
    Head(10) -> output first 10 items
    ConsoleEchoer()
)

Even then, for this problem, this is suboptimal - we want the merge to not actually output the merged heap, but rather the values from the heaps, so that it doesn't do unnecessary work. Either way though, an interesting problem. May get round to implementing this after I've dealt with STM.

Aside from distribution across CPUs, and collation of results, this looks relatively simple to implement. Mainly blogging about this whilst I remember. The other thing here as well to note is that the results would never need to be completely sorted in this version. In fact, the more CPUs you have, the less sorted the data would be!

Read and Post Comments

Minimalistic Software Transactional Memory

December 08, 2007 at 11:03 PM | categories: python, oldblog | View Comments

I've just posted this message on comp.lang.python looking for comments, but I'll post it here in case there's any feedback this way :-)

I'm interested in writing a simple, minimalistic, non persistent (at this stage) software transactional memory (STM) module. The idea being it should be possible to write such a beast in a way that can be made threadsafe fair easily.

For those who don't know, STM is a really fancy way of saying variables with version control (as far as I can tell :-) designed to enable threadsafe shared data.

I'm starting with the caveat here that the following code is almost certainly not threadsafe (not put any real thought into that as yet), and I'm interested in any feedback on the following:

Does the API look simple enough?
Are there any glaring mistakes in the code ? (It's always harder to see your own bugs)
What key areas appear least thread-safe, and any general suggestions around that.

If I get no feedback I hope this is of interest. Since these things get archived, if you're reading this a month, 6 months, a year or more from now, I'll still be interested in feedback...

OK, API.

First of all we need to initialise the store:

S = Store()

We then want to get a value from the store such that we can use the value, and do stuff with it:

greeting = S.using("hello")

Access the value:

print repr(greeting.value)

Update the value:

greeting.set("Hello World")

Commit the value back to the store:

greeting.commit()

If you have concurrent updates of the same value, the following exception gets thrown:

ConcurrentUpdate

cf:

>>> S = Store()
>>> greeting = S.using("hello")
>>> par = S.using("hello")
>>> greeting.set("Hello World")
>>> par.set("Woo")
>>> greeting.commit()
>>> par.commit()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 12, in commit
File "<stdin>", line 11, in set
__main__.ConcurrentUpdate

That's pretty much the simplest API I can come up with. (I've tried a few others but didn't really like them)

The way this works is we have a Store that manages Values. (perhaps a better name may be variables to avoid clashing with pythons parlance of labels and values?)

Anyhow, you can ask the store for Value, which it will give you. The Value knows what it's called and where it's from, so when you tell it to commit, it can try to do so. The little detail here is you get a copy of the Value not the stored Value. (This is to avoid accidental concurrent update of the same actual object)

As a result, Store looks like this:

class Store(object):
    def __init__(self):
        self.store = {}

    def get(self, key):
        return self.store[key].clone()

    def set(self, key, value):
        if not (self.store[key].version > value.version):
            self.store[key] = Value(value.version+1, value.value, self, key)
            value.version= value.version+1
        else:
            raise ConcurrentUpdate

    def using(self, key):
        try:
            return self.get(key)
        except KeyError:
            self.store[key] = Value(0, None,self,key)
            return self.get(key)

    def dump(self):
        for k in self.store:
            print k, ":", self.store[k]

You'll note that the set method is the greatest candidate for any possible race hazard here - though I can see a possible boundary issue in "using". (I think :-)

Otherwise I think the above code is relatively straightforward. You'll note that this API allows this:

greeting.set("Hello")
greeting.commit()
greeting.set("Hello World")
greeting.commit()
greeting.set("Hello World. Game")
greeting.commit()
greeting.set("Hello World. Game Over")
greeting.commit()

The other class is value that looks like this:

class Value(object):
    def __init__(self, version, value,store,key):
        self.version = version
        self.value = value
        self.store = store
        self.key = key

    def __repr__(self):
        return "Value"+repr((self.version,self.value,self.store,self.key))

    def set(self, value):
        self.value = value

    def commit(self):
        self.store.set(self.key, self)

    def clone(self):
        return Value(self.version, self.value,self.store,self.key)

To me this looks like a pretty complete minimalistic thing, which does seem to work OK, but I'm interested in the three points I mention above - if anyone is willing to comment - specifically:

Does the API look simple enough?
Are there any glaring mistakes in the code ? (It's always harder to see your own bugs)
What key areas appear least thread-safe, and any general suggestions around that.

If anyone comments - thanks!

Full code below.

class ConcurrentUpdate(Exception): pass

class Value(object):
    def __init__(self, version, value,store,key):
        self.version = version
        self.value = value
        self.store = store
        self.key = key

    def __repr__(self):
        return "Value"+repr((self.version,self.value))

    def set(self, value):
        self.value = value

    def commit(self):
        self.store.set(self.key, self)

    def clone(self):
        return Value(self.version, self.value,self.store,self.key)

class Store(object):
    def __init__(self):
        self.store = {}

    def get(self, key):
        return self.store[key].clone()

    def set(self, key, value):
        if not (self.store[key].version > value.version):
            self.store[key] = Value(value.version+1, value.value, self, key)
            value.version= value.version+1
        else:
            raise ConcurrentUpdate

    def using(self, key):
        try:
            return self.get(key)
        except KeyError:
            self.store[key] = Value(0, None,self,key)
            return self.get(key)

    def dump(self):
        for k in self.store:
            print k, ":", self.store[k]

S = Store()
greeting = S.using("hello")
print repr(greeting.value)
greeting.set("Hello World")
greeting.commit()
# ------------------------------------------------------
print greeting
S.dump()
# ------------------------------------------------------
par = S.using("hello")
par.set("Woo")
par.commit()
# ------------------------------------------------------
print greeting
S.dump()
# ------------------------------------------------------
greeting.set("Woo")
greeting.commit()

print repr(greeting), repr(greeting.value)

S.dump()

Read and Post Comments

Fascinating new definition of Free Software

December 05, 2007 at 07:37 PM | categories: python, oldblog | View Comments

On the backstage mailing list, out of the interminable rantings and ravings about whether DRM is a good or bad idea, whether people should call Linux GNU/Linux of "that piece of software I use" or just Tux, or indeed on a dozen other things, a (not to hot) flame war about BSD licensing vs GPL licensing came about. Now I've seen this argument played back and forth dozens of times (at least) over the past 10 - 15 years and it just get repetitive with the same arguments thrashed back and forth on both sides.

Well, this time, it surprised me. Someone - one Vijay Chopra specifically - came up with an argument I'd never seen before and whether or not you agree with it, it's just stunning to see something new on this whole issue. Vijay's position is in support of the BSD style licensing over GPL. Vijay was challenged with the following standard response that you'd expect:

> My usual response to this argument is that essentially you are asking
> for the freedom to restrict the freedom. This is patently absurd.

Which has always struck me as odd. Using the BSD license doesn't restrict anyone. It just means that it doesn't prevent anyone else from doing so either. This is often denigrated for not having a strong copyleft provision. Well, there's lots of possible next moves in this, and Vijay's was totally unexpected:

> Actually I'd compare free speech; it's not free speech unless it difficult
> to hear what I'm saying. Similarly it's not software freedom unless it's
> hard to bear what I'm doing to your code.

Think about it. The argument is this: unless someone has the ability to do something with your speech which upsets you, they don't have free speech. If they don't have the ability to do with code you wrote, they don't have free coding.

Whether or not you agree with it (not totally sure if I do), it's an absolutely fascinating argument due to its simplicity and completeness. It's also radically different from taking the perspective "There should be a right to do X" and then applying Kant's Universality Principle to that rule. It also naturally embodies the reason why freedom in coding may have good reason for restriction, because freedom of speech is often curtailed as well. (cf hate speech for example).

Fascinating. Didn't think I'd ever see a new argument in this arena :-)

Read and Post Comments

Multiple Windows in Pygame, using true concurrency in python via Kamaelia

November 27, 2007 at 02:11 AM | categories: python, oldblog | View Comments

First of all, the proof that this works in Kamaelia, right now:

OK, this is using the new code for running multiple Kamaelia type systems in multiple processes relatively simply. Specifically, the "interesting" code looks like this:

class SecondProcessBasedComponent(SimplestProcessComponent):
    def main(self):
        from Kamaelia.UI.Pygame.Display import PygameDisplay
        from Kamaelia.UI.Pygame.MagnaDoodle import MagnaDoodle

        X=PygameDisplay(width=200,height=200).activate()
        PygameDisplay.setDisplayService(X)
        MagnaDoodle().run()
        yield 1

exchange = SecondProcessBasedComponent().activate()
R = []
for _ in xrange(7):
     R.append(SecondProcessBasedComponent().activate())

The upshot? If the above code ran on a machine with 8 CPUs, it would use all 8 CPUs. With NO change to the pre-existing Kamaelia components. I find that pretty neat :-D

Read and Post Comments

The first Truly Concurrent Kamaelia Component

November 25, 2007 at 10:20 PM | categories: python, oldblog | View Comments

OK, this is using a massively simplified version of the primitives needed for concurrency in Kamaelia, but the following is the first component that will happily run completely in parallel with the rest of the system.

class FirstProcessBasedComponent(SimplestProcessComponent):
    def main(self):
        while 1:
            yield 1
            time.sleep(0.3)
            if self.dataReady():
                print time.time(),"main : RECEIVE FROM CHANNEL", self.recv()
            else:
                print time.time(),"main: CHANNEL NOT READY"

As you can see this is pretty much identical to the traditional Kamaelia model. Indeed, change the baseclass & you get a single threaded component, though you'd probably want to change the time.sleep behaviour.

The advantage here of course, is that, given a bit more work, that we should be able to take the entirety of Kamaelia's component set and simply parallelise it where it makes sense. The most obvious way being as a specialised Chassis. (other chasses are Pipeline, Graphline, Carousel(which is a bit brain numbing :) )).

Full code, which is currently in /Sketches, looks like this:

import pprocess
import time

class SimplestProcessComponent(object):
    def __init__(self):
        self.exchange = pprocess.Exchange()
        self.channel = None
        self.inbound = []

    def activate(self):
        channel = pprocess.start(self.run, None, None, named1=None, named2=None)
        exchange = pprocess.Exchange()
        exchange.add(channel)
        return exchange

    def run(self, channel, arg1, arg2, named1=None, named2=None):
        self.exchange.add(channel)
        self.channel = channel
        for i in self.main():
            pass

    def dataReady(self):
        return self.exchange.ready(timeout=0)

    def recv(self):
        if self.dataReady():
            for ch in self.exchange.ready(timeout=0):
                D = ch.receive()
                self.inbound.append(D)
        return self.inbound.pop(0)

    def main(self):
        yield 1

class FirstProcessBasedComponent(SimplestProcessComponent):
    def main(self):
        while 1:
            yield 1
            time.sleep(0.3)
            if self.dataReady():
                print time.time(),"main : RECEIVE FROM CHANNEL", self.recv()
            else:
                print time.time(),"main: CHANNEL NOT READY"

exchange = FirstProcessBasedComponent().activate()

while 1:
    time.sleep(0.7)
    print time.time(),"__main__ : SENDING TO CHANNEL"
    if exchange.ready(timeout=0):
        for ch in exchange.ready():
            ch.send({"hello":"X"})

Personally, I find this idea of true, but simple concurrency really quite a nice, fun idea :-)

Read and Post Comments

Kamaelia based (Extended) Entity Relationship Modelling Tool

November 25, 2007 at 01:52 AM | categories: python, oldblog | View Comments

This weekend's hack - a tool to make my life easier - a tool to make creation, modelling and playing with extended entity relationship diagrams simpler (Of the kind found in Elmasri & Navathe). The tool is currently sitting in my /Sketches/MPS area named somewhat imaginatively ERTopologyTool.py and I think it's really quite funky. I'm using it to make it easier to communicate ideas I've got on the participate project (which is taking most of my dayjob time at the moment), and it can take a textual description of the schema that looks like this:

entity missionagent
entity person(missionagent)
entity team(missionagent)

entity missionitem:
simpleattributes visible

entity activemission

relation participatesin(activemission,missionagent)
relation creates(missionagent,missionitem)

It then internally maps this first to an AST and then that's transformed into commands for a modified version of the TopologyVisualiser which look like this:

ADD NODE missionagent missionagent auto entity
ADD NODE person person auto entity
ADD NODE ISA1 isa auto isa
ADD NODE team team auto entity
ADD NODE missionitem missionitem auto entity
ADD NODE visible visible auto attribute
ADD NODE activemission activemission auto entity
ADD NODE participatesin participatesin auto relation
ADD NODE creates creates auto relation
ADD LINK ISA1 missionagent
ADD LINK person ISA1
ADD LINK team ISA1
ADD LINK missionitem visible
ADD LINK activemission participatesin
ADD LINK missionagent participatesin
ADD LINK missionagent creates
ADD LINK missionitem creates

And then finally that's rendered, and the final rendered version looks like this:

... and I personally think that's really kinda sweet. I wrote the code to compile a little language into the visualiser's little language, largely because this will simplify adding the attributes to all the entities in the main version of the model I'm working with. (the above is a kinda fragment). The vaguely amusing thing about this is that this inherits the autolayout goodness of earlier work, results in a program with relatively high levels of natural concurrency and took the best part of a day to write, but not all day. That's amusing because such things of auto-layout, high levels of concurrency in a simple user application, and EER diagram modelling tools have been the sort of thing that used to be the level of a third year project at one time... Whereas here, it's an afternoon/day's hack.

So, what's next?

An interesting possibility I have here, which I may or may not do next is add in automated table generation - this sort of diagram contains sufficient information normally to allow direct creation of a database schema in boyce-codd normal form, though I'm more likely to work on adding in layering. (which is more directly useful for API building and UI abstractions)
Other things I need to add in: cardinality constraints, key indicators, composite & repeated attributes, attributes of relations, weak entities, "must" constraints for relations, disjunction, conjunction and union subtyping (at the moment "isa" just implies disjunction).
Packaging up as a separate/standalone package/tool ala Kamaelia Grey.

Well, in practical terms what's next is actually finishing off integrating two data models using this, and maybe adding in SVG export I suspect... Or adding in a better way to export an image of the diagram... :-)

Read and Post Comments

Benchmarking - Kamaelia vs Stackless

November 21, 2007 at 11:48 PM | categories: python, oldblog | View Comments

Interesting post by rhonabwy on comparing Kamaelia to Stackless. The benchmark figures there are pretty grim from my perspective, but a useful starting point:

10 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10,1000)" 10 loops, best of 3: 127 msec per loop
100 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(100,1000)" 10 loops, best of 3: 587 msec per loop
1000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(1000,1000)" 10 loops, best of 3: 6.05 sec per loop
10000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10000,1000)"

10 loops, best of 3: 60.4 sec per loop

The grim part of course is the scaling aspect here. Handily though, rhonabwy post the code as well. I noticed that a key change we often make in mature code - to pause - was missing, so I changed the main loop slightly from this:

    def main(self):
        yield 1
        while 1:
            if self.dataReady('inbox'):

To this:

    def main(self):
        yield 1
        while 1:
            while not self.anyReady():
                self.pause()
                yield 1
            while self.dataReady('inbox'): # Note change from "if"

Green is new code, blue indicates a change. So, how does this perform? Well, first of all, running the unchanged code I get this:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 182 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 820 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 9.23 sec per loop

I lost patience at that point. It does show the same bad scaling properties though. So I then added n the changes above and reran it. This is what I got:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 206 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 267 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 816 msec per loop

Now I don't actually have that much memory on my machine so going above this causes my machine to start swapping, but even factoring that in, this is the next level up:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(5000,1000)"
10 loops, best of 3: 2.77 sec per loop

Now, clearly this still isn't as good as stackless - which isn't surprising, Stackless removes a whole layer of stack frame shenanigans from the entire system and it also implements its scheduler & channel handling in C. I also don't know how well the Stackless scheduler is optimised. Probably better than ours - ours implements a very simple round robin scheduler.

However, despite all this, after this small optimisation, the scaling properties are significantly better, and to my mind rather importantly much more in line with the scaling properties that stackless exhibits. A key point from this, using the optimisation I made it implies that we DO have better scaling properties using generators than with threads. Woo :-)

More seriously, this does imply that a natural way to optimise Kamaelia systems could be to simply create a new base class Axon.TaskletComponent.taskletcomponent that uses channels to implement inboxes/outboxes and scale out that way. It'd mean that one could take the bulk of their code with them and make small changes to gain further optimisations.

(aside: I'm a bit irritated by the memory consumption though, I know I don't have a huge amount of memory and I was running other applications on the system, but I did expect better. I'll have to look into that I think. When I get a round tuit. )

Overall though, despite not performing as well as stackless (which I did really expect, understanding the changes it makes) I am very pleased wth the scaling properties beng similar :-)

My full version of the code:

#!/usr/bin/python

import Axon
import time
import random
import sys

class hackymsg:
    def __init__(self,name):
        self.name = name

class counter:
   def __init__(self):
      self.count = 0
   def inc(self):
      self.count +=1

class hackysacker(Axon.Component.component):
    def __init__(self,name,circle,cntr,loops):
        Axon.Component.component.__init__(self)
        self.cntr = cntr
        self.name = name
        self.loops = loops # terminating condition
        self.circle = circle # a list of all the hackysackers
        circle.append(self)

    def main(self):
        yield 1
        while 1:
            while not self.anyReady():
                self.pause()
                yield 1
            while self.dataReady('inbox'):
                msg = self.recv('inbox')
                if msg == 'exit':
                    return
                if self.cntr.count > self.loops:
                    for z in self.circle:
                        z.inboxes['inbox'].append('exit')
                    return
                #print "%s got hackysack from %s" % (self.name, msg.name)
                kickto = self.circle[random.randint(0,len(self.circle)-1)]
                while kickto is self:
                    kickto = self.circle[random.randint(0,len(self.circle)-1)]
                #print "%s kicking hackysack to %s" %(self.name, kickto.name)
                msg = hackymsg(self.name)
                kickto.inboxes['inbox'].append(msg)
                self.cntr.inc()
                #print self.cntr.count
            yield 1

def runit(num_hackysackers=5,loops=100):
    cntr = counter()
    circle=[]
    first_hackysacker = hackysacker('1',circle,cntr,loops)
    first_hackysacker.activate()
    for i in range(num_hackysackers):
        foo = hackysacker(`i`,circle,cntr,loops)
        foo.activate()

    # throw in the first sack...
    msg = hackymsg('me')
    first_hackysacker.inboxes['inbox'].append(msg)

    Axon.Component.scheduler.run.runThreads()

if __name__ == "__main__":
    runit(num_hackysackers=1000,loops=1000)

Read and Post Comments

« Previous Page -- Next Page »

Links: