Sparks

Fascinating new definition of Free Software

December 05, 2007 at 07:37 PM | categories: python, oldblog | View Comments

On the backstage mailing list, out of the interminable rantings and ravings about whether DRM is a good or bad idea, whether people should call Linux GNU/Linux of "that piece of software I use" or just Tux, or indeed on a dozen other things, a (not to hot) flame war about BSD licensing vs GPL licensing came about. Now I've seen this argument played back and forth dozens of times (at least) over the past 10 - 15 years and it just get repetitive with the same arguments thrashed back and forth on both sides.

Well, this time, it surprised me. Someone - one Vijay Chopra specifically - came up with an argument I'd never seen before and whether or not you agree with it, it's just stunning to see something new on this whole issue. Vijay's position is in support of the BSD style licensing over GPL. Vijay was challenged with the following standard response that you'd expect:

> My usual response to this argument is that essentially you are asking
> for the freedom to restrict the freedom. This is patently absurd.

Which has always struck me as odd. Using the BSD license doesn't restrict anyone. It just means that it doesn't prevent anyone else from doing so either. This is often denigrated for not having a strong copyleft provision. Well, there's lots of possible next moves in this, and Vijay's was totally unexpected:

> Actually I'd compare free speech; it's not free speech unless it difficult
> to hear what I'm saying. Similarly it's not software freedom unless it's
> hard to bear what I'm doing to your code.

Think about it. The argument is this: unless someone has the ability to do something with your speech which upsets you, they don't have free speech. If they don't have the ability to do with code you wrote, they don't have free coding.

Whether or not you agree with it (not totally sure if I do), it's an absolutely fascinating argument due to its simplicity and completeness. It's also radically different from taking the perspective "There should be a right to do X" and then applying Kant's Universality Principle to that rule. It also naturally embodies the reason why freedom in coding may have good reason for restriction, because freedom of speech is often curtailed as well. (cf hate speech for example).

Fascinating. Didn't think I'd ever see a new argument in this arena :-)

Read and Post Comments

Multiple Windows in Pygame, using true concurrency in python via Kamaelia

November 27, 2007 at 02:11 AM | categories: python, oldblog | View Comments

First of all, the proof that this works in Kamaelia, right now:

OK, this is using the new code for running multiple Kamaelia type systems in multiple processes relatively simply. Specifically, the "interesting" code looks like this:

class SecondProcessBasedComponent(SimplestProcessComponent):
    def main(self):
        from Kamaelia.UI.Pygame.Display import PygameDisplay
        from Kamaelia.UI.Pygame.MagnaDoodle import MagnaDoodle

        X=PygameDisplay(width=200,height=200).activate()
        PygameDisplay.setDisplayService(X)
        MagnaDoodle().run()
        yield 1

exchange = SecondProcessBasedComponent().activate()
R = []
for _ in xrange(7):
     R.append(SecondProcessBasedComponent().activate())

The upshot? If the above code ran on a machine with 8 CPUs, it would use all 8 CPUs. With NO change to the pre-existing Kamaelia components. I find that pretty neat :-D

Read and Post Comments

The first Truly Concurrent Kamaelia Component

November 25, 2007 at 10:20 PM | categories: python, oldblog | View Comments

OK, this is using a massively simplified version of the primitives needed for concurrency in Kamaelia, but the following is the first component that will happily run completely in parallel with the rest of the system.

class FirstProcessBasedComponent(SimplestProcessComponent):
    def main(self):
        while 1:
            yield 1
            time.sleep(0.3)
            if self.dataReady():
                print time.time(),"main : RECEIVE FROM CHANNEL", self.recv()
            else:
                print time.time(),"main: CHANNEL NOT READY"

As you can see this is pretty much identical to the traditional Kamaelia model. Indeed, change the baseclass & you get a single threaded component, though you'd probably want to change the time.sleep behaviour.

The advantage here of course, is that, given a bit more work, that we should be able to take the entirety of Kamaelia's component set and simply parallelise it where it makes sense. The most obvious way being as a specialised Chassis. (other chasses are Pipeline, Graphline, Carousel(which is a bit brain numbing :) )).

Full code, which is currently in /Sketches, looks like this:

import pprocess
import time

class SimplestProcessComponent(object):
    def __init__(self):
        self.exchange = pprocess.Exchange()
        self.channel = None
        self.inbound = []

    def activate(self):
        channel = pprocess.start(self.run, None, None, named1=None, named2=None)
        exchange = pprocess.Exchange()
        exchange.add(channel)
        return exchange

    def run(self, channel, arg1, arg2, named1=None, named2=None):
        self.exchange.add(channel)
        self.channel = channel
        for i in self.main():
            pass

    def dataReady(self):
        return self.exchange.ready(timeout=0)

    def recv(self):
        if self.dataReady():
            for ch in self.exchange.ready(timeout=0):
                D = ch.receive()
                self.inbound.append(D)
        return self.inbound.pop(0)

    def main(self):
        yield 1

class FirstProcessBasedComponent(SimplestProcessComponent):
    def main(self):
        while 1:
            yield 1
            time.sleep(0.3)
            if self.dataReady():
                print time.time(),"main : RECEIVE FROM CHANNEL", self.recv()
            else:
                print time.time(),"main: CHANNEL NOT READY"

exchange = FirstProcessBasedComponent().activate()

while 1:
    time.sleep(0.7)
    print time.time(),"__main__ : SENDING TO CHANNEL"
    if exchange.ready(timeout=0):
        for ch in exchange.ready():
            ch.send({"hello":"X"})

Personally, I find this idea of true, but simple concurrency really quite a nice, fun idea :-)

Read and Post Comments

Kamaelia based (Extended) Entity Relationship Modelling Tool

November 25, 2007 at 01:52 AM | categories: python, oldblog | View Comments

This weekend's hack - a tool to make my life easier - a tool to make creation, modelling and playing with extended entity relationship diagrams simpler (Of the kind found in Elmasri & Navathe). The tool is currently sitting in my /Sketches/MPS area named somewhat imaginatively ERTopologyTool.py and I think it's really quite funky. I'm using it to make it easier to communicate ideas I've got on the participate project (which is taking most of my dayjob time at the moment), and it can take a textual description of the schema that looks like this:

entity missionagent
entity person(missionagent)
entity team(missionagent)

entity missionitem:
simpleattributes visible

entity activemission

relation participatesin(activemission,missionagent)
relation creates(missionagent,missionitem)

It then internally maps this first to an AST and then that's transformed into commands for a modified version of the TopologyVisualiser which look like this:

ADD NODE missionagent missionagent auto entity
ADD NODE person person auto entity
ADD NODE ISA1 isa auto isa
ADD NODE team team auto entity
ADD NODE missionitem missionitem auto entity
ADD NODE visible visible auto attribute
ADD NODE activemission activemission auto entity
ADD NODE participatesin participatesin auto relation
ADD NODE creates creates auto relation
ADD LINK ISA1 missionagent
ADD LINK person ISA1
ADD LINK team ISA1
ADD LINK missionitem visible
ADD LINK activemission participatesin
ADD LINK missionagent participatesin
ADD LINK missionagent creates
ADD LINK missionitem creates

And then finally that's rendered, and the final rendered version looks like this:

... and I personally think that's really kinda sweet. I wrote the code to compile a little language into the visualiser's little language, largely because this will simplify adding the attributes to all the entities in the main version of the model I'm working with. (the above is a kinda fragment). The vaguely amusing thing about this is that this inherits the autolayout goodness of earlier work, results in a program with relatively high levels of natural concurrency and took the best part of a day to write, but not all day. That's amusing because such things of auto-layout, high levels of concurrency in a simple user application, and EER diagram modelling tools have been the sort of thing that used to be the level of a third year project at one time... Whereas here, it's an afternoon/day's hack.

So, what's next?

An interesting possibility I have here, which I may or may not do next is add in automated table generation - this sort of diagram contains sufficient information normally to allow direct creation of a database schema in boyce-codd normal form, though I'm more likely to work on adding in layering. (which is more directly useful for API building and UI abstractions)
Other things I need to add in: cardinality constraints, key indicators, composite & repeated attributes, attributes of relations, weak entities, "must" constraints for relations, disjunction, conjunction and union subtyping (at the moment "isa" just implies disjunction).
Packaging up as a separate/standalone package/tool ala Kamaelia Grey.

Well, in practical terms what's next is actually finishing off integrating two data models using this, and maybe adding in SVG export I suspect... Or adding in a better way to export an image of the diagram... :-)

Read and Post Comments

Benchmarking - Kamaelia vs Stackless

November 21, 2007 at 11:48 PM | categories: python, oldblog | View Comments

Interesting post by rhonabwy on comparing Kamaelia to Stackless. The benchmark figures there are pretty grim from my perspective, but a useful starting point:

10 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10,1000)" 10 loops, best of 3: 127 msec per loop
100 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(100,1000)" 10 loops, best of 3: 587 msec per loop
1000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(1000,1000)" 10 loops, best of 3: 6.05 sec per loop
10000 concurrent objects, 1000 loops
python timeit.py -s "import hackysack" "hackysack.runit(10000,1000)"

10 loops, best of 3: 60.4 sec per loop

The grim part of course is the scaling aspect here. Handily though, rhonabwy post the code as well. I noticed that a key change we often make in mature code - to pause - was missing, so I changed the main loop slightly from this:

    def main(self):
        yield 1
        while 1:
            if self.dataReady('inbox'):

To this:

    def main(self):
        yield 1
        while 1:
            while not self.anyReady():
                self.pause()
                yield 1
            while self.dataReady('inbox'): # Note change from "if"

Green is new code, blue indicates a change. So, how does this perform? Well, first of all, running the unchanged code I get this:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 182 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 820 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 9.23 sec per loop

I lost patience at that point. It does show the same bad scaling properties though. So I then added n the changes above and reran it. This is what I got:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(10,1000)"
10 loops, best of 3: 206 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(100,1000)"
10 loops, best of 3: 267 msec per loop

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(1000,1000)"
10 loops, best of 3: 816 msec per loop

Now I don't actually have that much memory on my machine so going above this causes my machine to start swapping, but even factoring that in, this is the next level up:

~> python /usr/lib/python2.5/timeit.py "import hackysack" "hackysack.runit(5000,1000)"
10 loops, best of 3: 2.77 sec per loop

Now, clearly this still isn't as good as stackless - which isn't surprising, Stackless removes a whole layer of stack frame shenanigans from the entire system and it also implements its scheduler & channel handling in C. I also don't know how well the Stackless scheduler is optimised. Probably better than ours - ours implements a very simple round robin scheduler.

However, despite all this, after this small optimisation, the scaling properties are significantly better, and to my mind rather importantly much more in line with the scaling properties that stackless exhibits. A key point from this, using the optimisation I made it implies that we DO have better scaling properties using generators than with threads. Woo :-)

More seriously, this does imply that a natural way to optimise Kamaelia systems could be to simply create a new base class Axon.TaskletComponent.taskletcomponent that uses channels to implement inboxes/outboxes and scale out that way. It'd mean that one could take the bulk of their code with them and make small changes to gain further optimisations.

(aside: I'm a bit irritated by the memory consumption though, I know I don't have a huge amount of memory and I was running other applications on the system, but I did expect better. I'll have to look into that I think. When I get a round tuit. )

Overall though, despite not performing as well as stackless (which I did really expect, understanding the changes it makes) I am very pleased wth the scaling properties beng similar :-)

My full version of the code:

#!/usr/bin/python

import Axon
import time
import random
import sys

class hackymsg:
    def __init__(self,name):
        self.name = name

class counter:
   def __init__(self):
      self.count = 0
   def inc(self):
      self.count +=1

class hackysacker(Axon.Component.component):
    def __init__(self,name,circle,cntr,loops):
        Axon.Component.component.__init__(self)
        self.cntr = cntr
        self.name = name
        self.loops = loops # terminating condition
        self.circle = circle # a list of all the hackysackers
        circle.append(self)

    def main(self):
        yield 1
        while 1:
            while not self.anyReady():
                self.pause()
                yield 1
            while self.dataReady('inbox'):
                msg = self.recv('inbox')
                if msg == 'exit':
                    return
                if self.cntr.count > self.loops:
                    for z in self.circle:
                        z.inboxes['inbox'].append('exit')
                    return
                #print "%s got hackysack from %s" % (self.name, msg.name)
                kickto = self.circle[random.randint(0,len(self.circle)-1)]
                while kickto is self:
                    kickto = self.circle[random.randint(0,len(self.circle)-1)]
                #print "%s kicking hackysack to %s" %(self.name, kickto.name)
                msg = hackymsg(self.name)
                kickto.inboxes['inbox'].append(msg)
                self.cntr.inc()
                #print self.cntr.count
            yield 1

def runit(num_hackysackers=5,loops=100):
    cntr = counter()
    circle=[]
    first_hackysacker = hackysacker('1',circle,cntr,loops)
    first_hackysacker.activate()
    for i in range(num_hackysackers):
        foo = hackysacker(`i`,circle,cntr,loops)
        foo.activate()

    # throw in the first sack...
    msg = hackymsg('me')
    first_hackysacker.inboxes['inbox'].append(msg)

    Axon.Component.scheduler.run.runThreads()

if __name__ == "__main__":
    runit(num_hackysackers=1000,loops=1000)

Read and Post Comments

Pitivi

November 21, 2007 at 10:07 PM | categories: python, oldblog | View Comments

So, I've been recommended to use "pitivi" a few times, so I figures, why not give it a go. Well, I'm running the latest version of open suse, just released, what can go wrong eh? So I download it. And then I start downloading the dependencies it asks me to install along the way. Then it finally starts. Then I get told I need to install another dependency (gnonlin). OK, grab that. And after all that then what? It then says that the version of gstreamer I've got is too old. I'm sorry, but at this point I'm giving up.

I mean we're talking over 2000 files from 7 archives already:

~> for d in pitivi-0.11.1 pygobject-2.14.0 pycairo-1.4.0 pygtk-2.12.0 gst-python-0.10.8 zope.interface-3.3.0 gnonlin-0.10.9; do (cd $d; find -type f) ;done|wc -l
2194

I mean, that's around 3/4 million lines of code:

~> for d in pitivi-0.11.1 pygobject-2.14.0 pycairo-1.4.0 pygtk-2.12.0 gst-python-0.10.8 zope.interface-3.3.0 gnonlin-0.10.9; do (cd $d; find -type f)|( cd $d; while read file; do cat $file; done) ;done|wc -l
778041

I guess I'll try t again the next time I upgrade my distribution.

I'd consider changing to Ubuntu, but the last 2 times I tried it, it just ended up being just way too annoying...

Read and Post Comments

Using Python to push files using bluetooth

November 19, 2007 at 09:32 AM | categories: python, oldblog | View Comments

Just jotting this down really. You need Lightblue installed, which also requires pybluez (hence bluez) and also openobex installed. OK, no further ado:

>>> from lightblue import *
>>>
>>> finddevices()
[('00:11:9F:C3:7E:A7', u'Nokia 6630', 5243404)]
>>> services = findservices("00:11:9F:C3:7E:A7")
>>> services
[('00:11:9F:C3:7E:A7', None, u'SDP Server'),
('00:11:9F:C3:7E:A7', 1, u'Hands-Free Audio Gateway'),
('00:11:9F:C3:7E:A7', 2, u'Headset Audio Gateway'),
('00:11:9F:C3:7E:A7', 10, u'OBEX File Transfer'),
('00:11:9F:C3:7E:A7', 11, u'SyncMLClient'),
('00:11:9F:C3:7E:A7', 12, u'Nokia OBEX PC Suite Services'),
('00:11:9F:C3:7E:A7', 9, u'OBEX Object Push'),
('00:11:9F:C3:7E:A7', 3, u'Dial-Up Networking'),
('00:11:9F:C3:7E:A7', 15, u'Imaging')]
>>>
>>> D = [ (x,y,z) for x,y,z in services if "obex" in z.lower() and "push" in z.lower()]
>>> D
[('00:11:9F:C3:7E:A7', 9, u'OBEX Object Push')]
>>> (device, channel, description) = D[0]
>>> obex.sendfile(device, channel, "/media/usbdisk/Kamaelia/cat-trans.png")
[btobexclient_connect] Connecting transport...
[btobexclient_connect] Connecting OBEX session...
[btobexclient_done] Request 0x00 successful
[btobexclient_put] Sending file 'cat-trans.png' (116671 bytes)...
[btobexclient_done] Request 0x02 successful
[btobexclient_disconnect] Disconnecting...
[btobexclient_done] Request 0x01 successful
[btobexserver_cleanup] entry.
>>>

Read and Post Comments

The Selfish Programmer

October 01, 2007 at 01:18 AM | categories: python, oldblog | View Comments

I have the distinct pleasure of chairing the EBU Open Source workshop later today, having been invited to do so and give the opening welcome/talk. For a while I've been wracking my brains about "how do you open a workshop on open source where there's going to be a lot said on a lot of interesting topics without prejudging them or jumping the gun?". I decided therefore to do the only thing you can - to talk about what it means to me, why I develop open source, and what speaks to me. After all, many people distrust the concept of "something for nothing" (even though that's not really what open source is!).

One thing that speaks to me is the IBM advert for Linux from a few years ago. I was stunned at the time at them getting such a good metaphor. (In the slides where it says (video) it's referring to this advert)

1.

The Selfish

Programmer

EBU Open Source Workshop

Michael Sparks
Senior Research Engineer
BBC Research

Care

Open source software is created when

someone cares about a problem enough to

do something about it.

Open source is created when someone

shares their solution to a problem in a way

that allows others to solve more problems.

Give

By giving back code patches you gain better

code. By giving back documentation you get

better software.

Selfish

We want to solve the problem easily

We want to lighten the load, for all

We give and in turn receive

Teach

We're teaching the software to be more useful.

It stays adaptable to us, adaptable by us.

Software parts, not upgrades

(video)

Be Selfish

Take the code, see if it works for you, what have you got to lose?

Read and Post Comments

Recently published RFCs

September 25, 2007 at 12:38 PM | categories: python, oldblog | View Comments

RFC 4861 Neighbor Discovery for IP version 6 (IPv6)
This document specifies the Neighbor Discovery protocol for IP Version 6. IPv6 nodes on the same link use Neighbor Discovery to discover each other's presence, to determine each other's link-layer addresses, to find routers, and to maintain reachability information about the paths to active neighbors.
http://www.rfc-editor.org/rfc/rfc4861.txt Draft Standard Protocol.

RFC 4862 IPv6 Stateless Address Autoconfiguration
This document specifies the steps a host takes in deciding how to autoconfigure its interfaces in IP version 6. The autoconfiguration process includes generating a link-local address, generating global addresses via stateless address autoconfiguration, and the Duplicate Address Detection procedure to verify the uniqueness of the addresses on a link.
http://www.rfc-editor.org/rfc/rfc4862.txt Draft Standard Protocol.

RFC 4941 Privacy Extensions for Stateless Address Autoconfiguration in IPv6
Nodes use IPv6 stateless address autoconfiguration to generate addresses using a combination of locally available information and information advertised by routers. Addresses are formed by combining network prefixes with an interface identifier. On an interface that contains an embedded IEEE Identifier, the interface identifier is typically derived from it. On other interface types, the interface identifier is generated through other means, for example, via random number generation. This document describes an extension to IPv6 stateless address autoconfiguration for interfaces whose interface identifier is derived from an IEEE identifier. Use of the extension causes nodes to generate global scope addresses from interface identifiers that change over time, even in cases where the interface contains an embedded IEEE identifier. Changing the interface identifier (and the global scope addresses generated from it) over time makes it more difficult for eavesdroppers and other information collectors to identify when different addresses used in different transactions actually correspond to the same node.
http://www.rfc-editor.org/rfc/rfc4941.txt Draft Standard Protocol.

RFC 4942 IPv6 Transition/Co-existence Security Considerations
The transition from a pure IPv4 network to a network where IPv4 and IPv6 coexist brings a number of extra security considerations that need to be taken into account when deploying IPv6 and operating the dual-protocol network and the associated transition mechanisms. This document attempts to give an overview of the various issues grouped into three categories:
o issues due to the IPv6 protocol itself,
o issues due to transition mechanisms, and
o issues due to IPv6 deployment.
http://www.rfc-editor.org/rfc/rfc4942.txt Informational

RFC 4943 IPv6 Neighbor Discovery On-Link Assumption Considered Harmful
This document describes the historical and background information behind the removal of the "on-link assumption" from the conceptual host sending algorithm defined in Neighbor Discovery for IP Version 6 (IPv6). According to the algorithm as originally described, when a host's default router list is empty, the host assumes that all destinations are on-link. This is particularly problematic with IPv6-capable nodes that do not have off-link IPv6 connectivity (e.g., no default router). This document describes how making this assumption causes problems and how these problems outweigh the benefits of this part of the conceptual sending algorithm. This memo provides information for the Internet community.
http://www.rfc-editor.org/rfc/rfc4943.txt Informational

RFC 4959 IMAP Extension for Simple Authentication and Security Layer (SASL) Initial Client Response
To date, the Internet Message Access Protocol (IMAP) has used a Simple Authentication and Security Layer (SASL) profile which always required at least one complete round trip for an authentication, as it did not support an initial client response argument. This additional round trip at the beginning of the session is undesirable, especially when round-trip costs are high.

This document defines an extension to IMAP which allows clients and servers to avoid this round trip by allowing an initial client response argument to the IMAP AUTHENTICATE command. http://www.rfc-editor.org/rfc/rfc4959.txt Proposed Standard Protocol.

RFC 4960 Stream Control Transmission Protocol
This document obsoletes RFC 2960 and RFC 3309. It describes the Stream Control Transmission Protocol (SCTP). SCTP is designed to transport Public Switched Telephone Network (PSTN) signaling messages over IP networks, but is capable of broader applications.

SCTP is a reliable transport protocol operating on top of a connectionless packet network such as IP. It offers the following
services to its users:
o acknowledged error-free non-duplicated transfer of user data,
o data fragmentation to conform to discovered path MTU size,
o sequenced delivery of user messages within multiple streams, with an option for order-of-arrival delivery of individual user messages,
o optional bundling of multiple user messages into a single SCTP packet, and
o network-level fault tolerance through supporting of multi-homing at either or both ends of an association.

The design of SCTP includes appropriate congestion avoidance behavior and resistance to flooding and masquerade attacks.
http://www.rfc-editor.org/rfc/rfc4960.txt Proposed Standard Protocol.

RFC 4981 Survey of Research towards Robust Peer-to-Peer Networks: Search Methods
The pace of research on peer-to-peer (P2P) networking in the last five years warrants a critical survey. P2P has the makings of a disruptive technology -- it can aggregate enormous storage and processing resources while minimizing entry and scaling costs.

Failures are common amongst massive numbers of distributed peers, though the impact of individual failures may be less than in conventional architectures. Thus, the key to realizing P2P's potential in applications other than casual file sharing is robustness.

P2P search methods are first couched within an overall P2P taxonomy. P2P indexes for simple key lookup are assessed, including those based on Plaxton trees, rings, tori, butterflies, de Bruijn graphs, and skip graphs. Similarly, P2P indexes for keyword lookup, information retrieval and data management are explored. Finally, early efforts to optimize range, multi-attribute, join, and aggregation queries over P2P indexes are reviewed. Insofar as they are available in the primary literature, robustness mechanisms and metrics are highlighted throughout. However, the low-level mechanisms that most affect robustness are not well isolated in the literature. Recommendations are given for future research.
http://www.rfc-editor.org/rfc/rfc4981.txt Informational

RFC 4994 DHCPv6 Relay Agent Echo Request Option
This memo defines a Relay Agent Echo Request option for the Dynamic Host Configuration Protocol for IPv6 (DHCPv6). The option allows a DHCPv6 relay agent to request a list of relay agent options that the server echoes back to the relay agent.
http://www.rfc-editor.org/rfc/rfc4994.txt Proposed Standard Protocol.

RFC 5004 Avoid BGP Best Path Transitions from One External to Another
In this document, we propose an extension to the BGP route selection rules that would avoid unnecessary best path transitionsbetween external paths under certain conditions. The proposed extension would help the overall network stability, and more importantly, would eliminate certain BGP route oscillations in which more than one external path from one BGP speaker contributes to the churn.
http://www.rfc-editor.org/rfc/rfc5004.txt Proposed Standard Protocol.

RFC 5005 Feed Paging and Archiving
This specification defines three types of syndicated Web feeds that enable publication of entries across one or more feed documents. This includes "paged" feeds for piecemeal access, "archived" feeds that allow reconstruction of the feed's contents, and feeds that are explicitly "complete".
http://www.rfc-editor.org/rfc/rfc5005.txt Proposed Standard Protocol.

RFC 5006 IPv6 Router Advertisement Option for DNS Configuration
This document specifies a new IPv6 Router Advertisement option to allow IPv6 routers to advertise DNS recursive server addresses to IPv6 hosts. This memo defines an Experimental Protocol for the Internet community. http://www.rfc-editor.org/rfc/rfc5006.txt Experimental

RFC 5010 The Dynamic Host Configuration Protocol Version 4 (DHCPv4) Relay Agent Flags Suboption
This memo defines a new suboption of the Dynamic Host Configuration Protocol (DHCP) relay agent information option that allows the DHCP relay to specify flags for the forwarded packet. One flag is defined to indicate whether the DHCP relay received the packet via a unicast or broadcast packet. This information may be used by the DHCP server to better serve clients based on whether their request was originally broadcast or unicast.
http://www.rfc-editor.org/rfc/rfc5010.txt Proposed Standard Protocol.

RFC 5014 IPv6 Socket API for Source Address Selection
The IPv6 default address selection document (RFC 3484) describes the rules for selecting source and destination IPv6 addresses, and indicates that applications should be able to reverse the sense of some of the address selection rules through some unspecified API. However, no such socket API exists in the basic (RFC 3493) or advanced (RFC 3542) IPv6 socket API documents. This document fills that gap partially by specifying new socket-level options for source address selection and flags for the getaddrinfo() API to specify address selection based on the source address preference in accordance with the socket-level options that modify the default source address selection algorithm. The socket API described in this document will be particularly useful for IPv6 applications that want to choose between temporary and public addresses, and for Mobile IPv6 aware applications that want to use the care-of address for communication. It also specifies socket options and flags for selecting Cryptographically Generated Address (CGA) or non-CGA source addresses. This memo provides information for the Internet community.
http://www.rfc-editor.org/rfc/rfc5014.txt Informational

RFC 5018 Connection Establishment in the Binary Floor Control Protocol (BFCP)
This document specifies how a Binary Floor Control Protocol (BFCP) client establishes a connection to a BFCP floor control server outside the context of an offer/answer exchange. Client and server authentication are based on Transport Layer Security (TLS).
http://www.rfc-editor.org/rfc/rfc5018.txt Standards Track

RFC 5019 The Lightweight Online Certificate Status Protocol (OCSP) Profile for High-Volume Environments
This specification defines a profile of the Online Certificate Status Protocol (OCSP) that addresses the scalability issues inherent when using OCSP in large scale (high volume) Public Key Infrastructure (PKI) environments and/or in PKI environments that require a lightweight solution to minimize communication bandwidth and client-side processing.
http://www.rfc-editor.org/rfc/rfc5019.txt Proposed Standard Protocol.

RFC 5022 Media Server Control Markup Language (MSCML) and Protocol
Media Server Control Markup Language (MSCML) is a markup language used in conjunction with SIP to provide advanced conferencing and interactive voice response (IVR) functions. MSCML presents an application-level control model, as opposed to device-level control models. One use of this protocol is for communications between a conference focus and mixer in the IETF SIP Conferencing Framework.
http://www.rfc-editor.org/rfc/rfc5022.txt Informational

RFC 5029 Definition of an IS-IS Link Attribute Sub-TLV
This document defines a sub-TLV called "Link-attributes" carried within the TLV 22 and used to flood some link characteristics.
http://www.rfc-editor.org/rfc/rfc5029.txt Proposed Standard Protocol.

RFC 5061 Stream Control Transmission Protocol (SCTP) Dynamic Address Reconfiguration
A local host may have multiple points of attachment to the Internet, giving it a degree of fault tolerance from hardware failures. Stream Control Transmission Protocol (SCTP) (RFC 4960) was developed to take full advantage of such a multi-homed host to provide a fast failover and association survivability in the face of such hardware failures. This document describes an extension to SCTP that will allow an SCTP stack to dynamically add an IP address to an SCTP association, dynamically delete an IP address from an SCTP association, and to request to set the primary address the peer will use when sending to an endpoint.
http://www.rfc-editor.org/rfc/rfc5061.txt Proposed Standard Protocol.

RFC 5062 Security Attacks Found Against the Stream Control Transmission Protocol (SCTP) and Current Countermeasures
This document describes certain security threats to SCTP. It also describes ways to mitigate these threats, in particular by using techniques from the SCTP Specification Errata and Issues memo (RFC 4460). These techniques are included in RFC 4960, which obsoletes RFC 2960. It is hoped that this information will provide some useful background information for many of the newest requirements spelled out in the SCTP Specification Errata and Issues and included in RFC 4960. This memo provides information for the Internet community.
http://www.rfc-editor.org/rfc/rfc5062.txt Informational

RFC 5072 IP Version 6 over PPP
The Point-to-Point Protocol (PPP) provides a standard method of encapsulating network-layer protocol information over point-to-point links. PPP also defines an extensible Link Control Protocol, and proposes a family of Network Control Protocols (NCPs) for establishing and configuring different network-layer protocols.

This document defines the method for sending IPv6 packets over PPP links, the NCP for establishing and configuring the IPv6 over PPP, and the method for forming IPv6 link-local addresses on PPP links.

It also specifies the conditions for performing Duplicate Address Detection on IPv6 global unicast addresses configured for PPP
links either through stateful or stateless address autoconfiguration.

This document obsoletes RFC 2472.
http://www.rfc-editor.org/rfc/rfc5072.txt Draft Standard Protocol

Read and Post Comments

Greylisting using Kamaelia

September 19, 2007 at 10:49 PM | categories: python, oldblog | View Comments

I've written a greylisting server using Kamaelia, and its turned my mail back to something usable. I've been running this server for 52 hours now & it's processed over 5000 mails. 94% of those have been rejected as spam, leaving a handful of spams coming through from mailing lists. It's a spectacular change for me.

How does it work? Well at it's core, when someone connects, a mail handler is create, which is managed by this main loop:

def main(self):
    brokenClient = False
    self.handleConnect()
    self.gettingdata = False
    self.client_connected = True
    self.breakConnection = False

    while (not self.gettingdata) and (not self.breakConnection):
        yield WaitComplete(self.getline(), tag="_getline1")
        try:
            command = self.line.split()
        except AttributeError:
            brokenClient = True
            break
        self.handleCommand(command)
    if not brokenClient:
        if (not self.breakConnection):
            EndOfMessage = False
            self.netPrint('354 Enter message, ending with "." on a line by itself')
            while not EndOfMessage:
                yield WaitComplete(self.getline(), tag="getline2")
                if self.lastline():
                    EndOfMessage = self.endOfMessage()
            self.netPrint("250 OK id-deferred")

    self.send(producerFinished(),"signal")
    if not brokenClient:
        yield WaitComplete(self.handleDisconnect(),tag="_handleDisconnect")
    self.logResult()

Handle command then results in a bunch of SMTP commands being dealt with, and dispatched:

def handleCommand(self,command):
    if len(command) < 1:
        self.netPrint("500 Sorry we don't like broken mailers")
        self.breakConnection = True
        return
    if command[0] == "HELO": return self.handleHelo(command) # RFC 2821 4.5.1 required
    if command[0] == "EHLO": return self.handleEhlo(command) # RFC 2821 4.5.1 required
    if command[0] == "MAIL": return self.handleMail(command) # RFC 2821 4.5.1 required
    if command[0] == "RCPT": return self.handleRcpt(command) # RFC 2821 4.5.1 required
    if command[0] == "DATA": return self.handleData(command) # RFC 2821 4.5.1 required
    if command[0] == "QUIT": return self.handleQuit(command) # RFC 2821 4.5.1 required
    if command[0] == "RSET": return self.handleRset(command) # RFC 2821 4.5.1 required
    if command[0] == "NOOP": return self.handleNoop(command) # RFC 2821 4.5.1 required
    if command[0] == "VRFY": return self.handleVrfy(command) # RFC 2821 4.5.1 required
    if command[0] == "HELP": return self.handleHelp(command)
    self.netPrint("500 Sorry we don't like broken mailers")
    self.breakConnection = True

In practical terms that MailHandler is subclassed by a ConcreteMailHandler that effectively enforces the normal sequence of commands of SMTP. However part of it has a core hook when we receive the DATA command:

def handleData(self, command):
    if not self.seenRcpt:
        self.error("503 valid RCPT command must precede DATA")
        return

    if self.shouldWeAcceptMail():
        self.acceptMail()
    else:
        self.deferMail()

Clearly the main hook here is "shouldWeAcceptMail" which defaults in ConcreteMailHandler to returning False.

In the actual class we instantiate to handle connections - GreyListingPolicy which subclasses ConcreteMailHandler - we customise shouldWeAcceptMail as follows:

def shouldWeAcceptMail(self):
    if self.sentFromAllowedIPAddress():
        return True           # Allowed hosts can always send to anywhere through us
    if self.sentFromAllowedNetwork():
        return True           # People on trusted networks can always do the same
    if self.sentToADomainWeForwardFor():
        try:
            for recipient in self.recipients:
                if self.whiteListed(recipient):
                    return True
                if not self.isGreylisted(recipient):
                    return False
        except Exception, e:
            print "Whoops", e
        return True # Anyone can always send to hosts we own

    # print "NOT ALLOWED TO SEND, no valid forwarding"
    return False

Finally the actual core code for handling greylisting looks like this:

def isGreylisted(self, recipient):
    max_grey = 3000000
    too_soon = 180
    min_defer_time = 3600
    max_defer_time = 25000

    IP = self.peer
    sender = self.sender
    def _isGreylisted(greylist, seen, IP,sender,recipient):
        # If greylisted, and not been there too long, allow through
        if greylist.get(triplet,None) is not None:
            greytime = float(greylist[triplet])
            if (time.time() - greytime) > max_grey:
                del greylist[triplet]
                try:
                    del seen[triplet]
                except KeyError:
                    # We don't care if it's already gone
                    pass
                print "REFUSED: grey too long"
            else:
                print "ACCEPTED: already grey (have reset greytime)" ,
                greylist[triplet] = str(time.time())
                return True
        # If not seen this triplet before, defer and note triplet
        if seen.get( triplet, None) is None:
            seen[triplet] = str(time.time())
            print "REFUSED: Not seen before" ,
            return False

        # If triplet retrying waaay too soon, reset their timer & defer
        last_tried = float(seen[triplet])
        if (time.time() - last_tried) < too_soon:
            seen[triplet] = str(time.time())
            print "REFUSED: Retrying waaay too soon so resetting you!" ,
            return False

        # If triplet retrying too soon generally speaking just defer
        if (time.time() - last_tried) < min_defer_time :
            print "REFUSED: Retrying too soon, deferring" ,
            return False

        # If triplet hasn't been seen in aaaages, defer
        if (time.time() - last_tried) > max_defer_time :
            seen[triplet] = str(time.time())
            print "REFUSED: Retrying too late, sorry - reseting you!" ,
            return False

        # Otherwise, allow through & greylist them
        print "ACCEPTED: Now added to greylist!" ,
        greylist[triplet] = str(time.time())
        return True

    greylist = anydbm.open("greylisted.dbm","c")
    seen = anydbm.open("attempters.dbm","c")
    triplet = repr((IP,sender,recipient))
    result = _isGreylisted(greylist, seen, IP,sender,recipient)
    seen.close()
    greylist.close()
    return result

All of which is pretty compact, and I suspect is pretty OK for people to follow. The rest of the code in the file is really about dealing with errors and abuse of the SMTP code. (The reaction to which is to disconnect telling the sender to retry later)

At present I'm ironing out some remaining issues (some people simply don't disconnect and need booting), and the code also depends on versions of Axon & Kamaelia that are sitting on my Scratch branch. All that said, you can check out the code (link is to web svn) here using this command line:

svn co https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/trunk/Sketches/MPS/Grey Grey

You can get the Axon & Kamaelia versions you need from this command line:

svn co https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/branches/private_MPS_Scratch Kamaelia

Install the contents of the Axon directory, then the Kamaelia directory by doing "python setup.py install" in each.

You can then configure the greylisting code, by changing the class GreylistServer, which for me looks like this:

class GreylistServer(MoreComplexServer):
    socketOptions=(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    port = 25
    class protocol(GreyListingPolicy):
        servername = "mail.cerenity.org" # Server name we greet the world with
        serverid = "MPS-SMTP 1.0"         # Server type we declare ourselves to be
        smtp_ip = "192.168.2.9" # SMTP server we forward to
        smtp_port = 8025         # SMTP server port we forward to
        allowed_senders = ["127.0.0.1"]
        allowed_sender_nets = ["192.168.2"] # Yes, only class C network style
        allowed_domains = [ "private.thwackety.com",
                            "thwackety.com",
                            "yeoldeclue.com",
                            ... other domains snipped ...
                            "kamaelia.org",
                            "owiki.org",
                            "cerenity.org"
        ]
        whitelisted_triples = [
             # IP, claimed sender (MAIL FROM:), recipient from "RCPT TO:"
             ( "213.38.186.202", "<post@mx1.redcats.co.uk>", "<...email censored...>"),
        ]
        whitelisted_nonstandard_triples = [
             # claimed hostname, IP prefix (can be full IP), recipient from "RCPT TO:"
             ("listmail.artsfb.org.uk", "62.73.155.19", "<...email censored...>"),
             ("domainwithborkedmailer.com", "204.15.20", "<...email censored...>"),
             ("adomainwithborkedmailer.com", "204.15.20", "<...email censored...>"),
             ("yetanotherdomainwithborkedmailer.com", "204.15.20", "<...email censored...>"),
             ("andanotherdomainwithborkedmailer.com", "204.15.20", "<...email censored...>"),
        ]

I've blanked out the email addresses, since there's no point in encouraging more spam... :-)

I'll be packaging this up properly at some point when I'm happy with the code. In the meantime if anyone grabs it and uses it from SVN, I'd be interested in hearing how you get on :-)

Read and Post Comments

« Previous Page -- Next Page »

Links:

The Selfish

Programmer

EBU Open Source Workshop

Care

Open source software is created when

someone cares about a problem enough to

do something about it.

Share

Open source is created when someone

shares their solution to a problem in a way

that allows others to solve more problems.

Give

By giving back code patches you gain better

code. By giving back documentation you get

better software.

Selfish

We want to solve the problem easily

We want to lighten the load, for all

We give and in turn receive

Teach

We're teaching the software to be more useful.

It stays adaptable to us, adaptable by us.

Software parts, not upgrades

(video)

Be Selfish

Take the code, see if it works for you, what have you got to lose?