Flying Spaghetti Monster shows his Noodly Appendages in Germany!
November 11, 2006 at 07:57 PM | categories: python, oldblog | View Comments- http://www.youtube.com/watch?v=vL7FcvEydqg&eurl=
Technical Debt (links)
November 10, 2006 at 05:05 PM | categories: python, oldblog | View CommentsTechnical debt links I want to comeback to:
(I'd heard about technical debt a while back you see, but recently hit what I think can best be described as management debt! :)
Microsoft will recommend SUSE Linux Enterprise for Windows-Linux solutions
November 03, 2006 at 12:12 AM | categories: python, oldblog | View CommentsRunning on someone elses platform - scammers uploading and running PHP phishing on the cheap
November 02, 2006 at 10:19 AM | categories: python, oldblog | View CommentsYep, I received a phishing email, and the fake login.php appears to have been placed on an innocent third party's website - because it allowed uploads (anonymously or otherwise), and the upload location will run arbitrary php files. (Probably due to a bad gallery style application in this case))
How long have people been doing this? Dunno. I tend to file all such scam mails in the circular inbox, but it's a new one on me. Checking inside there shows that all the others I still have copies of are all using cgi-bin's which is a lot harder than just uploading a file and saying "run this".
The scammer needs no serving resources, no resources for sending the emails, in fact the barrier to entry is almost non-existant - just find a site that allows uploads and runs arbitrary uploaded code. People complain about windows machines allowing arbitrary execution of code, but this is far worse - this is directly akin to saying "hey, upload random code to me and do what you like". It's more or less the modern equivalent of putting a machine on the internet running telnet or ssh, but without and password set for root.
It's also rather nasty - someone who runs a server in such a mode, also probably won't be able to do the correct forensics to track down where things came from. Delightful. If anyone's curious, the subject line of this phishing scam is "Please authenticate and update your Amazon.com account by checking the link below immediately" - you've probably got a copy.
As a result, please, unless your system for uploads is designed for handling code, please have someway of ensuring that whatever is uploaded cannot be run. This comment comes around with every mechanism for sharing, and I almost can't believe you have to say it, and yet you do. Hopefully though someone from Django & TurboGears will go "ooh, if we provide a safe mechanism for doing that, everyone will want to compete with us and implement it too " :-) It won't solve all the problems, but it'd be a cool start.
One of these days, people will learn :-/
Computer Science Unplugged - Non Computer Based Activities For Children
October 29, 2006 at 11:25 AM | categories: python, oldblog | View CommentsThat said, I still think the best single "textbook" on computer science is the Cartoon Guide to Computer Science, however it looks like this "unplugged" book is much more practical for showing ideas to a group or as inspiration for other approaches. Think: Royal Institution Christmas Lectures, or "Think of a Number", rather than "traditional lecture".
Interestingly I came across that site via LULU (the unplugged book is here)- a publishing site which was apparently founded by Bob Young (one of the founders of Redhat). Incidentally lulu also appears to have 3 books on ubuntu that looks quite nice - on desktops, servers and one on packaging. I don't know what they're like, but given their aggregate cost is about the same as 3 beers in london, that's pretty good going!
Open Source at the BBC - when, why, why not and how
October 28, 2006 at 03:24 PM | categories: python, oldblog | View CommentsWell, this is the text version of the presentation I gave at Linux World. It should be borne in mind that this has a mixture of both personal opinion, and statements about the BBC. It tries to be as impartial as it can be and is written around the point of view that it should be interesting to businesses (of all kinds), rather than push any particular agenda. Furthermore it's also based on arguments I've seen taken as good business arguments outside the BBC as well inside. As a result this post can't be taken as BBC policy, but is hopefully a useful snapshot of how open source is viewed in a number of areas of the BBC.
A corrollary of this however, is if it looks like there's anything
controversial here, it is extremely likely that my description is
inaccurate and a formal BBC position would be somewhat different!
Part of the reason for this is because there is no special policy with regard to open source, and if one is formulated, it obviously may not mesh with what's said here. That said, at Linux World, many people did find the presentation useful and interesting, perhaps because of the aims of more business focussed and impartial than perhaps normal. It's probably useful to people running a business for a number of reasons, and engineers working inside a business as for good reasons when to use/improve/originate open source, and when not to do so.
Open Source At The BBC
This document is based on the presentation given in the "Business Briefings" track at Linux World London Olympia 2006. Whilst the presentation was formally approved and represents one of the BBC's opinions, this document may personal opinions which are not BBC opinions, despite the attempt at impartiality. Any feedback on this document are extremely welcome, and the any arguments are presented as business reasons.
It's probably worth also noting up front that many of the arguments presented whilst positive for open source should not be taken as negatives for proprietary systems, since a number of the arguments presented apply to proprietary systems as well. Indeed this actually shows some of the similarities from a business perspective between open source and proprietary systems, and should be useful when, why, why not and how to use, improve or originate open source software.
The rest of this is more or less an after the fact transcription of the slides and what I said.
Open Source at the BBC, Michael Sparks, Senior Research Engineer, Linux World London, 2006
Who am I?
Well, as noted I'm currently a senior research engineer at BBC Research and have been working here so far for 4 years. Previously I've worked as a software enginer and network systems engineer at a variety of public and private sector organisations. This talk is based on my experience in both, and as such hopes to be applicable to both public and private sector organisations. However, I'm not a business person, these are arguments and reasons that I've found businesses find valid as well as neutral. If ykou find this jarrs with your experience, or views please feel free to correct me.
This talk
As noted on the day, this is the summary of the talk, and should contain pretty much everything in the slides and that was said on the day. My day (and by choice evening) open source project is Kamaelia, and I mentiond that this would be linked from the Kamaelia project's blog.
Themes
Repeating through this document are 4 repeating themes - When, Why, Why Not, and How. We talk about a variety of subjects, and these are examined through each of these positions from the viewpoint of a business, as well as from the perspective of how the subjects fit in with the BBC. Where appropriate these themes are also applied to what I've seen inside the BBC.
Overview of the rest of this document
First of all we cover the basic context for the talk - the status of open source at the BBC - how it's viewed essentially, what we mean by open source, and why open source. We then cover using, improving and originating open source through the themes already mentioned.
Context
The BBC:
- Uses and creates open source software
- Uses and create proprietary systems. This may suprise people, but one example here is the NICAM stereo system. This originated inside the BBC and was licensed to manufacturers, but at the end of the day a proprietary system. However, because of this the UK License Fee Payer benefited from the availabiliy of stereo broadcasts and equipment to use to recieve digital stereo broadcasts, through licensing of the system.
The question arises - why? Why does the BBC use & create open source? Why does the BBC use & create proprietary systems? The answer to this however is simple and obvious: for good business and public service reasons. The rest of this document explores this reasons.
Terminology
Many people use a variety of phrases relating to open source - free software, libre software, open source, OSS, FLOSS (good for your teeth?). Generally speaking the BBC (and specifically this document) tends to use the phrase open source. The reason is because "open source", as defined by the OSI, focusses on the approach not on politics. In the talk I did not go into why the BBC focusses on approach rather than politics, and why that's appropriate, but hopefully it should be obvious why we specifically avoid politics.
Use of Open Source
This is the simplest of all areas to deal with, and probably the easiest for businesses to take note of.
The obvious question that someone can ask is this: why use open source software?
Well, the BBC has no specific policy for or against open source software - tools are evaluated on their merits. This is simply good business practice. One key evaluation point from a business perspective is that software that is adaptable to the business is inherently more useful. Also it's worth noting that whilst proprietary software is often adaptable to a greater or lesser extent through the use of macros and plugins, open source is by definition adaptable.
So whilst some proprietary systems can to differing extents be adapted to suit a business, open source defaults to always being as adaptable as you are willing to put in time and effort. Whether this is appropriate or necessary to your business is dependent on your specific needs.
However a better question to ask is why would the BBC avoid open source software? After all you would not ask the same for a proprietary system. During my talk I noted that actually a number of people at Linux World might deliberately avoid proprietary software, but generally speaking in a business proprietary systems and software are extremely common and expected.
- ... would limit the BBC's choice's unnecessarily
- ... cut the BBC off from useful technologies
- ... cut the BBC off from community developments
- ... and would even mean avoiding some Apple and Microsoft products.
- This last point suprised some people, and may suprise readers of this document. The reason it would cut you off from Apple products is because large chunks of Apple's operating system are actually based on open source technology. Apple have in the past used this as a positive selling point for their systems. Similarly parts of some Microsoft products, and indeed products like Iron Python are open source. The TCP/IP stack for some versions of Windows are based on the BSD TCP/IP stack, whereas in the past[1] a Microsoft Developer Evangelist[2] has proudly stated "and the best thing about Iron Python is that it's open source".
- [1] ACCU Python Conference, April 2005
- [2] aka "sales guy" - his own translation
What this boils down to is the fact that open source is in fact really quite difficult to avoid even if you were trying and this is especially true with regard to the internet. If you use the Internet you are almost certainly using open source software. The core of the domain name system - without which the internet would not function - which translates server names (the name after "http") to server locations - uses BIND which is open source. People like to point at Apache (an open source webserver) as the dominant webserver with over 50% of market share as open source, but this misses than a significant chunk of the servers in the remaining market share are also open source. The same goes for email, and a number of other core protocols.
As a result, if you use the Internet, you use open source.
So coming to our themes...
When should you evaluate using an open source solution?
The answer to this is blindingly obvious - when you are looking for a new solution. You should evaluate it alongside any other solution that exists on the market. The key difference however, is that whilst a proprietary system will have a marketing force, it's entirely likely that the open source solution will not.
A trite example here is you can happily get a backup provider or replication company to come talk to you to sell their product, however you are unlikely to get the same for rsync - a great little tool for efficient cross site (or even on disk) backup system. (For example, did you know you can build essentially a versioned file system using rsync?) As a result it is up to you to seek out an open source solution. If you don't you may not have the solution most suitable for your business (Then again, by luck, you might).
Why would you, as a business, choose to use an open source solution?
Similarly the answer here is obvious - when it fulfills the business need in an appropriate way. That may be based on cost. It may be based on availability of expertise in the marketplace for personel, it may be based on being the most efficient, secure solution or simply based on adaptability.
Why would , as a business, choose to not use an open source solution?
Well, this is the flipside. If it doesn't provide for the business need in an appropriate way. This may sound a trite reversal of the previous point, and it is. But the point remains the same - if it doesn't fulfill your needs and it doesn't make business sense to spend the time on making it sensible to apply in the business, is it really wise to use an inappropriate solution.
It should also be obvious that these reasons for why/why not both apply to proprietary systems as well, and if the point appears to be that you as a business should treat it as any other software. There are differences, but fundamentally, to a business its what it does, not how its made that matters.
How do you go about using open source software
For some businesses this seems to be a suprisingly difficult thing to grasp, but in practice it boils down to being exactly the same as a proprietary system - you obtain the software and install it. This may mean downloading it from a website, getting it from a cover disk, or even going into a shop and buying it (eg linux distros), but either way the way you use it is identical - obtain, install, use.
If your business is more than a few people you probably hire someone capable of doing this for you - many companies have a systems administrator or IT staff who should be more than capable of performing this function. Similarly if you're installing a completely new type of software you may hire a consultant, and large companies will involve and IT services company. At the end of the day however, this is exactly the same as for a proprietary solution - there is nothing strange about open source!
So how does the BBC use open source?
Well, to go into detail would take too long and could be a talk in itself, so some highlights based on three areas - network infrastructure, desktop applications and desktops.
In network intrastructure, the BBC depends on Apache and perl for the serving and other aspects of delivey of bbc.co.uk. If you removed apache from the equation, the BBC would have a number of problems (not matter what webserver you changed to - open source or proprietary) due to the amount the BBC has adapted apache to the BBC's needs. Other open source technologies the BBC use include MySQL, Linux and python. It's also worth noting that the BBC does use proprietary systems as well in the network infrastructure - including for example Solaris & Oracle.
Neither of these two lists of technologies is anywhere near exhaustive.
In the world of desktop applications, like many large corporations, the BBC has an approved "BBC Desktop" which has a number of applications for use by the majority of BBC staff. There are some locations which don't use the BBC desktop as their primary desktop, and there tends to be a greater use of open source on those desktops. It should be noted however that the vast majority of applications used by the BBC are proprietary software.
That said, what do some parts of the BBC use? Well Firefox and the Gimp are approved for use on the BBC desktop, and many parts of the BBC (especially inside BBC Research) that don't use the BBC desktop by default also use tools like Open Office.
In the world of desktops, open source is actually slightly more widely in use, though perhaps less obviously. The reason for this is because their is an approved BBC Desktop for Mac OS X. Whilst Mac OS X is a proprietary OS from Apple large chunks of the system are based on open source technologies and the core of the system is derived from Free BSD. Darwin - the part of of the OS X that Apple maintain as open source - feeds back into OS X.
Similarly Linux is used as a primary desktop by a large number of engineers inside BBC Research, since it is very often the most sensible tool for the job.
Benefits to the BBC
So, in the areas that the BBC does use open source, what benefits does the BBC gain? Reasons include:
- Adaptability
- Stability
- Security
- Maintainability
- Standards
- Market expertise
The last one may raise some eyebrows, but is highly relevant. The ability to replace staff when they move on, as they do, with people with the appropriate expertise is extremely important. In the networking world, Unix has long been the Lingua Franca, and that's one area that Linux has been able to capitalise on on. As a result using systems based on open source in those realms is sensible since there is a readily available source of expertise.
Improving Open Source Software
This boils down to contributing back to the community. This is something that some businesses get badly wrong, or find it difficult to understand why it makes sense.
However one of the key reasons why it makes sense to a business to contribute back is maintainence.
If you have a problem that you've not seen before and the community hasn't seen before, and you fix it or report it to the community then you're contributing. By contributing back a fix or a solution back to the community, rather than having to locally maintain that fix, you allow others to take on that maintenance burden. For the community it may be as simple as merging the fix or changing the documentation, and be a low workload for them.
If you maintain it locally however, every time a release is made you need to apply your changes to that code base. The same holds for proprietary systems as well, except often at best you can have a workaround rather than a fix. Also unlike a proprietary solution you don't have to wait for a new release - you can fix the problem, use it locally and move on. Even better is that the community can adapt your changes to work better with the system it sits inside.
Similarly by contributing back, you encourage others to do the same. The upshot of this is that it leads to the situation that the workload is spread. Each bug fixed by another group is a bug you won't encounter.
So, what is contribution?
Well, the most obvious contribution is code. Out of code contributions, the most welcome code contribution is a bugfix. Actual new features are treated more warily - they might take the project in a direction it's current developers aren't interested in, they might incur a large maintenance overhead, or simply just be undesirable. As a result, that's often why bugfixes are met with open arms and new features with care. (patches with large maintenance overheads are often transformed into code with low maintenance overheads however!)
Contribution goes beyond code. A bug report is contributing back. Documentation errors are bugs. If the documentation says "the code works like this", then unless it does, either the code is broken or the documentation is broken. Either way, something needs fixing. Intelligent questions can lead to both improved documentation and sometimes better code. There is a great article about "how to ask smart questions", and you're encouraged to read it!
Finally, talking about usage is in itself also contributing back. When I was working at the Janet Web Cache Service - a service then provided to the Higher Education community, we were looking at load balancing solutions, and trialled Linux Virtual Servers. We found it worked and did precisely what was claimed for it, it did it efficiently and was worth using. A £400 linux box was able to provide the same benefit to us there a £10,000 load balancing switch, and was more flexible in our problem domain of large scale distibuted network caching. We then rolled this out to cover the whole service, and in the end used it as our primary load distribution tool.
By talking about our usage of LVS and how stable we found it, we encouraged other people to try the system, since they could see the scale at which we were it. In short it helped people understand that rather than just being some random project, that this tool was tried and tested. Indeed, we only trialled the software because there were some smaller scale systems that existed and a community that shared its experiences.
Lastly on this point, even sharing a negative experience is useful. It is often painful for a project to hear "your code doesn't work or scale for our environment", however in practice this is no different from any other bug. The project might not simply have considered your problem domain and may think "cool, we could do that!".
So why does the BBC contribute back?
Well, we don't contribute back as often as we would perhaps like, but we do contribute back. So why? Well, for all the reasons previously mentioned, but there's also a rather special one for the BBC.
Open source methodology is essentially the software equivalent of the tradition open standardisation process as used for decades, mainly in the hardware arena. All of these standards bodies operate on the basic principle that any interested and capable party is welcome to participate, and this ethos is exactly the same with open source. If you can contribute, you're welcome.
Also, the BBC has a long history in working with standards bodies, from the EBU though to the ITU, etc, through to the development of specific standards like PAL, DVB (freeview) & DAB (digital replacement for FM radio) and DRM (Digital Radio Mondiale - the digital replacement for shortwave).
So the question arises: what's the difference between a traditional standards body and an open source project - they both have the same basic mode of operation. The answer is really "very little". For traditional standards bodies, engineers inside the BBC have been encouraged for a long time to contribute to projects which benefit the BBC. Similarly given you can't really draw a clear line between traditional standards bodies and open source - largely because standards bodies have often used open implementations - there is a standing position that engineers can contribute to any project, where it benefits the BBC (and hence the license fee payer) to do so.
Contributing back as a Business
So when could you as a business consider contributing back?
When you've solved a problem (which could be using the software, or fixing the software), or need to solve a problem. It may be that your problem is relatively simple to solve (bugs, deployment) or may be a completely new area. As noted above the forms of problem and contribution can be extremely varied. All of these help to improve the system.
Why contribute back as a business?
The fundamental reason is that by doing so, and by encouraging others to do so, you improve the product you're using get better, which in turn makes it more valuable to your business.
Why would you not contribute back as a business?
Probably the biggest, most fundamental reason would relate to when doing so would share business secrets, confidental or personal information. For example, if you're under and NDA regarding (say) how a protocol works, sharing that information with an open source project is bad both for you and for that project.
How to contribute back as a business?
Unlike using open source, businesses are often right to pause before doing this, because doing so badly can be a time sink for both the project and for the business. When business do it well, it's great and when they do it badly, it can be painful for all involved.
Contributing bug reports is perhaps the most common way a business contributes back. If you do so, make them repeatable - unless the developers can repeat your bugs, they can't fix them. Make them clear, focus on what specifically doesn't work, and the environment it occurs within. Also focus on one bug report at a time. If you find a handful of bugs, you cafix.n make them all reference each other - this is useful to identify if there is an underlying problem. However, it is equally likely that they're simply independent, and hence independently fixable.
Also, when reporting bugs, you should bear in mind that you can have no expectation of the bugs being fixed, unless you put effort into fixing it. This can be through fixing it yourself, or hiring a consultant to help you. Whilst the bug you need fixing may be critical to your business, you have to bear in mind that the developers often will have the viewpoint that you have benefited from the release rather than them, and that unless you put effort into fixing the bug they have no reason to fix your bugs.
Not all projects have this ethos (indeed many don't) and some will jump forward to fix the bug - especially if it is a security issue - but it is a very good idea to come in with no expectation of fix unless you're willing to contribute to that fix.
And so we came to discussing bug fixes. The best way to contribute these is for them to be clear & focussed. Contributing back bug fixes is large area in itself, but some guidelines worth noting:
- Make them focussed. If you contribute a bug fix, fix one bug in one patch. Ten 100 line patches are far easier for a project to deal with that one 1000 line patch.
- Follow the project culture. Do they have a preferred patch format? Do they prefer entire files? Do they like them emailed? Notified of a location over IRC? Attached to a wiki page, embedded in a wiki page? Do they use tabs or spaces? etc
- Make them easy to apply.
- Make them relevant. If you do a patch based on a version 2 years old, it's alot less relevant than a patch against the most recent available version - for example against subversion, CVS or developer snapshot.
- Include documentation. If your patch changes functionality its your responsibility to provide the updated documentation. You can do this after code patch has been agreed as suitable, but make the offer - and then back it up if they agree to merge your patch.
If you don't know how to do any of these things, but do have a bug fix to offer - ask. People will often be more than willing to assist you in fixing bugs in their project, and if you do things completely wrong, you will often look extremely stupid. Conversely if you ask and get it all right, you will be seen to be trying to work with the project rather than against the project, and people will both respect you for doing so, and also be more willing to accept your patch. This has the same fringe benefits for a company (respect for your company) as it does for individuals.
However it's worth noting that open source is often a meritocracy - conceptual contributions in the form of code & documentation are often more important than monetary contribution. Put bluntly start talking about money (in the wrong way) and you will often alientate developers. This isn't universally true, but is true sufficiently often that it's worth bearing in mind.
Yes, the work done by open source developers is often priceless, and yes, developers like money as much as the next person, but it is simply that money rarely directly contributes to a project's development. Some companies however are extremely good at using money to contribute - such as Fluendo - but you often find they also contribute directly in other ways as well.
TIP: At the end of the day however, if you're unsure how your company can contribute back, it's well worth the effort of liasing through an open source consultancy. Many of these exist, both large and small, and they'll be more than willing to help you. This can be as small as one developer who works with a project, all the way through to a large scale business like the IBM's of the world.
Project Origination
Moving on, we come to starting new projects and releasing new code.
Why?
The key reasons for many businesses really boil down to two main points - it's not their core business and it's not a saleable product (however useful). For example, the BBC's core business is making great programmes and distributing them to the UK public for their entertainment, education and information. If we produce software we're using inhouse, it makes sense to release it, and we'll come to why shortly. Other reasons include encouragement of standards and development of standards.
For example the BBC has collaborated for many years on a standard called AAF and MXF. Part of the collaboration has actually occurred on sourceforge as an open source project. The use of an open source approach makes this sort of external collaboration a lot simpler.
From a research perspective, the aspects of peer review and repeatability are vitally important, and an important part of the scientific process. Release of your code makes this process a lot simpler.
However the key one for many businesses we've already alluded to: it's when the development of the software will happen anyway, but you can't sell it. A common example here is inhouse tools. If you release these tools as open source, then if you get no feedback, you have no loss. You're still going to develop the tools to suit your business. However if you release those tools, any feedback you get is a benefit which you would not have gained without release.
Less obviously, by doing so, depending on your market sector, you increase the viability of your market sector for your business. This is a bold claim, and unsuprisingly is probably met by the reader here with incredularity as it was by the audience at Linux World. However, this is precisely what happened with the internet. Long before the free software foundation existed, and before the GNU manifesto saw the light of day, the internet was under development using what we now call open source processes.
People were creating the email systems which would become dominant today as far back as the very beginning of the eighties. By sharing the code, their independent areas become more and more viable as a business market until we reached the point where now we can't actually imagine going back to removing the internet.
BBC Open Source
So we now come to BBC Open Source. The BBC does have a formal web presence for the projects its originated as open source, which can be found at http://www.bbc.co.uk/opensource/. However it's worth noting that this merely contains pointers to where the code is hosted, which leads to the obvious question of why?
It's a recognition that once you release code, you are no longer in control of the fate of that code. You will probably have respect from those who take that code and use it for a number of reasons, and have defacto control of what will be considered the primary release. However if you abuse the resulting trust, or simply make what the users of that code consider bad decisions, anyone who uses your code can take the code in a completely different direction. If you use community hosting, you make it clear that you understand this, and that the code is essentially "owned" by those who use the code, even if initially the only user is the person who originated it.
Finally, and in the case of code released by companies this is important, by using a third party hosting site you're making it clear that the code both won't and can't be taken away, since management structures do change. Specifically, it's interesting to see companies like Google who provide their own hosting facilities for open source projects use sourceforge to host a number of their open source projects.
What has the BBC originated as open source projects?
Well, at the end of the day, quite a lot - a snapshot:
- Kamaelia, Dirac, TV-Anytime API, Betsie, Media Dispatch, MXF File Test Engine, Video Shot Change Detector, Media Lounge, Pony, 2 Apache Modules, 10 CPAN Modules, BAP Tools, ID3v2 Chapter tools, Flash tools, AFFEditPack, ....
This isn't an exhaustive list, and actually contains more projects than are listed on the BBC Open Source website. Each project here is either an overhead for the BBC, or has some clear reasons for release based on R&D reasons, opening up options for how the BBC can evolve, and so on.
I could go through all of these projects one by one, but that would be laborious and painful, and quite frankly the best resource for each of these projects is linked from the website after all. However I will pull out two projects which I'm personally familiar with as to the aims of the projects, and the resulting benefits those projects have gained by release.
Dirac
I'm not a member of the dirac team, so there may be some minor inaccuracies here, but generally speaking this should give you a clear idea
The key aim for Dirac boils down to the desire to have a royalty free video codec that we can do what we like with - for online delivery, offline delivery etc. To be able to provide encoders and decoders to the audience in addition to use within the BBC. Clearly we recognised that there are natural benefits to the community by releasing this in the manner we have done - since it allows both vendors and the audience do interesting things.
However, when Dirac was started and later released it looks like the only suitable video codecs would have prohibitively high costs - greater than the cost of development of a new codec - for these sorts of tasks, which were becoming apparent as desirable.
Among the benefits of release, there has been input and assistance with support for different platforms. The project has recieved useful peer review, and there's also been interest from groups like the W3C who have been seeking a royalty free codec for standardisation as a W3C standard. In order for this to happen, there needs to be multiple independent implementations of Dirac. As a result a company called Fluendo is in the process of sponsoring an independent implementation called Schrodinger. This is something that simply would not have happened without the initial release of Dirac.
Kamaelia
This is my project, so as a result I could probably spend all afternoon talking about this. So I'll be brief! The original aim of the Kamaelia project was as follows - "to develop tools for scalable long term delivery of all BBC content". The glib way of summarising that is to say "how do you deliver the BBC's archive online to every UK household, what do you need, and what breaks?". In practice, the answer is "pretty much everything" and it's actually not quite as simple as that. However it gives you an idea of the problem. We released Kamaelia to allow for experimentation with different approaches, based on the recognition that any input from the community would be welcome, but that we would need to do this work anyway.
Community collaboration has been interesting, and is helping the project fulfill some possibilities we suspected it would be useful for. ie some things that we suspected the code could be useful for, but that we would not have the time/resource to work on have been explored by the community. This is helping to slowly evolve the project into a general purpose networked multimedia toolkit.
In recognition of that these increased capabilities are realistic, we're revising the project goals for Kamaelia (or rather the BBC's goals for the Kamaelia Project) to the following: "To do for software systems what IKEA has done for furniture, and spreadsheets have done for traditional business, but for the BBC's business of storytelling and distribution" (ie creating and telling great stories). OK, this is a big goal, but it now seems realistic, and still recognises that from our perspective, we're still interested in scalable long term deliver of all BBC content, but we also recognise that Kamaelia may well also have a role in the production of that content, and that it's flexibility may be especially useful.
As result, the benefits of release of Kamaelia as open source include peer review (people have pointed out dumb things in our code as well as good), a much more general purpose and useful system, validation of some theories of applicability of the project, and perhaps most interestingly dissemination of the project into the business. Often getting R&D work out from the the labs into use into the wider organisation can be difficult in any large organisation.
In the case of Kamaelia, the first major group of people to pick up Kamaelia off sourceforge and do something interesting with it were Radio & Music Interactive, who used it for prototyping and a system to assist in the creation of podcasts of BBC stations, which ended up change some viewpoints in these areas. Only after they had done this project and shown it to a variety of groups of people did we, by chance, find out that they had done this.
From an R&D perspective, this is a great success - work was done, the business picked it up, found it useful, and perhaps most interestingly did not have to come back to R&D for support.
And so on. I could go on, but I'll stop there, except for one point.
Appropriate Licensing
Both Kamaelia & Dirac use the Mozilla "tri-license", which can often lead to the question of "why"? Let's look at the 3 licenses briefly to see why.
- The MPL is useful because it contains an explicit patent grant. In the case of Dirac this was particularly useful because wavelet based video codecs can be a patent minefield, and one way of protecting it is to attempt to patent various aspects of Dirac. If granted, that affords the codec specific protections, and if they aren't is says that the codec's basics aren't patentable. Either way this provides useful protection. However if patents were granted, it is important to explicitly say "you can use this", which is essentially what the patent grant says. (read Lawrence Rosen's excellent book on open source licensing if you want to understand in more detail what specifically is granted).
- The GPL is is the most widely used license in open source projects, and interoperation there is extremely important, or else the project becomes an evolutionary niche.
- Finally, the LGPL is extremely useful since at the end of the day, these projects are essentially libaries, and if a hardware vendor wanted to include our code into a set top box, and doesn't make any modifications, the LGPL license essentially says they can include the code without releasing any code they write that uses the library - they just have to share their changes to the library.
For both Dirac & Kamaelia, interoperation with the open source world (via the GPL) and with the proprietary software world (via the LGPL) were extremely important, and the explicit patent grant (as well as non-political nature) of the MPL were considered extremely important.
OK, so now let's move on.
Originating Open Source for Businesses.
When do you originate an open source project - when is the time ripe to release something as open source? Well, probably the most compelling time is when you are developing software inhouse that is inherently a money sink - ie an overhead, rather than a money source (eg a product).
However, this isn't sufficient release to release, this is just one of the most common scenarios where it makes sense to evaluate release. So the question arises - why originate a project? Well, when you think that overhead - usually an inhouse tool - that you think others will find useful. Another reason would be when it will benefit your market sector making it more attractive and viable.
A nice example is what happens when you're developing a web application - such as Basecamp - and extract the core code that makes that work as an application framework and release that as open source and call it Ruby on Rails? How has that affected the market sector viability of development using Ruby for example? The interesting corrollary of this is that by doing so, they helped create a community of companies, and as a result it should be clear that communities of companies can generate wealth for all concerned.
The other good reason to originate a project is when you're seeking to gain some level of competitive advantage through the release of what essentially becomes an open platform. Whilst on the surface of things this is counter intuitive, what you're actually doing is increasing the size of the potential audience that will be interested in you - since generally speaking, consumers prefer open platforms.
So, given all this, why would you not originate an open source project? Well, the most obvious answer is when you gain a real competitive advantage by not doing so - for example when all that you sell is software. If you're such a company, and only sell software, and not software+something else - such as a webservice, support services, consultancy services (eg an API, or bespoke customisation), then originating open source looks difficult to justify.
Some other pragmatic, non-exhaustive, reasons not to release as open source are:
- If you're not prepared to accept community contributions
- If you're not willing to risk letting go of control of the codebase
- If your code contains secrets of any kind. These secrets may be personal information, business confidential information, be covered by NDA's or be things like contain private keys.
So, given all this, suppose your company does choose to release a project as open source, how would you go about originating an open source project? Well, this is a large area in itself, and it's impossible to do it justice in this short document (and the short session at Linux World), but some important tips for a company considering release as open source:
- First and foremost, create a project on an independent hosting site - be it Sourceforge, Google Code, Berlios, etc. The reason for this, aside from the comments already mentiond includes the fact that if you abandon the project, these sites have processes in place for allowing abandoned projects to be taken over.
- Secondly, choose an appropriate license. Understand the implications of the license. Does it allow the code to be closed, which aside from some obvious risks, makes it more attractive to hardware manufacturers. Does it requires people who use your code to release their code? Are their potential users which you think would find this diffcult, and does that matter to your business?
- Create a contributor agreement. This is as much about protecting your contributors as it is about protecting you. The former point about this is something that you should bear in mind. It's very easy to try to shift everything off to the contributor, but you have to bear in mind that unless your contributor agreement provides a benefit to contributors that there's little point in agreeing to sign it. Probably the biggest benefit you can provide as a company is that of being the entity whose license is being infringed if someone breaches the project license. This does put an element of responsibility on the company in this scenario, however it's a responsibility the company is likely to want to have, since it will be, at least initially, the primary contributor to the project.
- Create mailing lists - you don't need to provide support for your code, but not providing community mechanisms for self support is not wise. Specific mailing lists you should create:
- A general discussion list, for self support, discussion of project direction and so on.
- An announcements list
- A list for recieving emails from the version control system the project uses - ie cvs/svn commit emails. This list is important since it allows people to see the activity on the project.
- Similarly create a project blog, and use it (at minimum) whenever anything notable happens. Whilst blogs are not necessarily everyone's cup of tea, and can sometimes come across as very "self promoting" or "self important" to the UK eye, they are very often internationally more viewed as a mechanism of noting your project's very existance, to the extent that certain groups almost appear to believe that if you don't have a blog that you don't exist.
- Release something usable. It doesn't have to be complete, or perfect, or do everything you want it to do, but it must be useful for something, however basic. Vapourware in code form is as interesting as vapourware in press release form.
- Regarding an release announcements, focus on the code, and functionality of the code & project, not on you or your company. Generally speaking people aren't the least bit interested in your company beyond the fact that you're working on the project. They do want to know who's behind the project, expecially if a company is involved, but you should keep comments, if any, about your company to a minimum.
Almost finally, don't expect instantaneous success. Your initial users are like gold dust, treat them with respect, and thank them for their input. Whilst you might think they should treat you like gold dust, they didn't actually have to feedback to you. If they do it's incredibly useful to you, and your very first users help your project get over misconceptions, paving the way for more users and potential contributors later down the line. They might not make that transition themselves, but people can, do and will read mailing lists to see your behaviour with regard to contributors.
Read up on how other people have done this, both specifically, distilled and in terms of the greater market. 3 useful sources of information:
- The Cathedral and Bazaar by Eric S Raymond. (book and online essay)
- Producing Open Source Software by Karl Fogel (book)
- Hackers and Painters by Paul Graham (an online essay as well as a book of a collection of essays)
Once again however, don't expect instantaneous success!
Finally, to conclude, this document is based on a talk entitled "open source at the BBC". Proprietary software is as widely used, if not significantly greater used (in differing areas) inside the BBC. Nothing in this talk is intened to say anything negative about proprietary systems, merely to highlight a number of aspects of open source that deserve examination. If you're interested in using proprietary software, the entire business world is already geared up to help you. If you want to improve proprietary software, one traditional approach is to start a new business, and entire business schools exist to assist you there. Similarly if you want to originate new proprietary software, then the world is already set up to help you.
Hopefully this document assists you with these aspects of open source - in terms of when, why, why not and how.
If you're a private business, you have the option of ideology as a merit - and when happens, it's interesting and can be very positive (or negative) that can be very interesting to see.
However, a public service does not have that option. Yes they serve the public, which can put them under the whims of the politics of the time, but it's not the place of a public service to originate those politics.
And on that note, if you've reached this point - thanks for reading, I hope you found it useful.
More infomation:
- BBC Open Source website
- BBC OSS FAQ's
- Joel on Software - Strategy Letter V
This is an essay on the popular Joel on Software site, and is hopefully an interesting foil to this document, from a business perspective, about how businesses can view release of code as open source. - Cathedral & Bazaar
- Hackers and Painters (book)
- Producing Open Source Software
LONIX Users Meeting
October 25, 2006 at 04:02 PM | categories: python, oldblog | View CommentsLinux World, London, Olympia 25-26th October
October 25, 2006 at 12:20 AM | categories: python, oldblog | View CommentsKamaelia in this months Linux Format
October 19, 2006 at 11:25 PM | categories: python, oldblog | View CommentsShared Markov Chain Chatterbox
October 17, 2006 at 12:47 AM | categories: python, oldblog | View CommentsAnd that's pretty much all there is to it. As you'd imagine (I hope), a Chatty component is created to handle any accepted connection on port 1500, and anything the user types is received on the inbox "inbox", used to update the class's markov chain DB, and then generates a response to send to the outbox "outbox" (meaning it gets sent to the socket). The upshot is the more people who connect, the more the database gets updated.import Axon, random
nlnl = '\n', '\n'
key = nlnl
def new_key(key, word):
if word == '\n': return nlnl
else: return (key[1], word)
class Chatty(Axon.Component.component):
data = {}
def updateChain(self, message):
key = nlnl
for word in message.split():
self.__class__.data.setdefault(key, []).append(word)
key = new_key(key, word)
def response(self):
key, result, word = nlnl, [], None
while word != "\n":
word = random.choice(self.__class__.data.get(key, nlnl))
key = new_key(key, word)
result.append(word)
return " ".join(result)
def main(self):
while 1:
if self.dataReady("inbox"):
message = self.recv("inbox")
self.updateChain(message)
self.send(self.response(), "outbox")
yield 1
if __name__ == "__main__":
from Kamaelia.Chassis.ConnectedServer import SimpleServer
SimpleServer(protocol=Chatty, port=1500).run()
The nice thing about this is that the bulk of the code here focusses on the logic that's desired, not on any networking details. OK, this example isn't ideal because it misses some important things like shutdown and what happens if the connection disappears, but it also is interesting because you can test the component in isolation as well:
Which is a nice thing to be able to do! If you wanted to train the markov chain server you could also do that as follows:Pipeline(
ConsoleReader(),
Chatty(),
ConsoleEchoer(),
).run()
The fun thing about this trainer is that you can see the output from the markov chain during testing as well :-)Pipeline(
ReadFileAdaptor("SomeTrainingMaterial"),
TCPClient("127.0.0.1", 1500), # assuming localhost
ConsoleEchoer(), # May as well see the deranged output :)
).run()
« Previous Page -- Next Page »