Parsing XML in Kamaelia using an expat parser
January 13, 2008 at 03:05 PM | categories: python, oldblog | View Comments
Generally we've used sax for parsing XML, but it's useful to show how to parse XML using a parser like expat that works by calling back into your code. The trick, as usual with anything long running, is to put the thing that does the long running call into a thread and have it emit messages when a callback is called. The following is a minimal example:
import timeThis generates the following output:
import Axon
import xml.parsers.expat
from Kamaelia.Chassis.Pipeline import Pipeline
from Kamaelia.Util.Console import ConsoleEchoer
class Parser(Axon.ThreadedComponent.threadedcomponent):
data = "<h1> Default </h1>" # Can be overridden by kwargs as normal
def start_element(self,name,attrs):
self.send(("START", name,attrs), "outbox")
def end_element(self,name):
self.send(("END", name), "outbox")
def char_data(self,data):
data = data.strip()
self.send(("DATA", data), "outbox")
def main(self):
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = self.start_element
p.EndElementHandler = self.end_element
p.CharacterDataHandler = self.char_data
p.Parse(self.data, 1)
time.sleep(1)
self.send(Axon.Ipc.producerFinished(), "signal")
Pipeline(
Parser(data="<body><h1>Hello</h1> world <p>Woo</p></body>"),
ConsoleEchoer(),
).run()
('START', u'body', {})('START', u'h1', {})('DATA', u'Hello')('END', u'h1')('DATA', u'world')('START', u'p', {})('DATA', u'Woo')('END', u'p')('END', u'body')The nice thing about this of course is that this then allows you to test the thing that's taking this information in isolation from the XML handling code. Indeed, it allows for a much simpler test harness overall.