XmlSerializable interface implementation

This document explains how to turn the objects of your model into XML-serializable objects by implementing the XmlSerializable interface of X-O lite.

The way to use XMLSerializable objects to parse and generate XML (or in other terms serialize/deserialize or marshall/unmarshall your objects to/from XML) is explained in the 'Usage' document.

prerequisites: Though X-O lite is based on the Simple API for XML (SAX), this document will not re-explain the SAX usage. To know more about SAX itself, go to http://en.wikipedia.org/wiki/Simple_API_for_XML or http://www.saxproject.org/.

Table of Contents

The XMLSerializable interface

To support XML parsing/serialization with X-O lite, the java objects must implement the XMLSerializable interface. This simple interface only contains 3 methods to implement, 2 for parsing (xml to objects) one for serializing (objects to xml).

the X-O lite API is based on SAX:

  • for parsing, the two methods are simplified SAX events (startElement and endElement)
  • for serialization, you fire simplified SAX events.
    public interface XMLSerializable {
    
        public void startElement(String uri, String localName, XMLEventParser parser) throws SAXException;
        public void endElement(String uri, String localName, XMLEventParser parser) throws SAXException;
    
        public void serialize(XMLSerializer serializer) throws XMLSerializeException;
    
    }
            

'Simplified' SAX means that your objects only have to deal with a little subset of the SAX events (no startPrefixMapping or locator management) and that they will only receive events corresponding to their own data. The XMLEventParser implementation handles internally a lot 'parsing state' that is usually pushed by SAX into the client code (the defined prefixes and namespace, accumulated text data, locator ...).

This document explains how to implement those 3 methods of the XMLSerializable interface in your java objects. The X-O lite usage explains how to use the XMLSerializable objects to parse/serialize XML.

From XML to java objects

Simple objects

The typical way of Object parsing with X-O lite is the following:

1) the object is created by external context (you'll see later how)

2) the first time startElement is called on the object, it's for the XML tag representing the object itself. At this time the parser method isFirstEvent() returns true.

3) some startElement/endElement calls are made for object content.

4) when first element end is reached, endElement is called a last time. At this time the parser method isLastEvent() returns true.

Example:

With the following XML fragment:

    ...
       <foo id="123">
          <bar>blah</bar>
       </foo>
    ...         

Assuming that we have a Foo object with two fields: id and bar whose values are mapped respectively to the "id" attribute and the "bar" sub-element. The sequence of method called on the Foo XMLSerializable when parsing will be:

  • startElement for foo (parser.isFirstEvent() == true)
  • startElement for bar
  • endElement for bar
  • endElement for foo (parser.isLastEvent() == true)

In the startElement notification, you can ask the XMLEventParser for attributes values (just like in usual SAX parsing). The difference with plain SAX is that the XMLEventParser is given as parameter in the startElement method, so you don't have to keep a reference to it elsewhere.

In the endElement notification, you can ask the XMLEventParser for the text content of this element (while with plain SAX you have to accumulate it yourself).

So the code necessary to parse this simple XML fragment to an Foo object is:

public class Foo implements XMLSerializable {

    private int id;
    private String bar;

    
    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("foo")) {
            id=Attributes.getMandatoryInt("id", parser);
        }
    }
   
    
    public void endElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("bar")) {
            bar=ElementText.getString("<none>", parser);
        }
    }

    (...)

}

The Attributes and ElementText classes are helper classes that allows you to easily get and format attributes and element text values. For each type (String, int, boolean, double, date ...) they propose get-and-format methods for mandatory or optional parameters. If a mandatory parameter is not defined they throw an exception with a clear error message. For optional parameters, a default value must be provided.

Note that, in the code snippet above the condition localName.equals("foo") could have been replaced by parser.isFirstEvent() because for this simple structure the first element notified to a Foo object must always have 'foo' as local name. It's only in more complex case like mapping the same object to several elements or defining recursive structure (see 'directory' example below) that the isFirstEvent() capability of the parser makes a real difference.

Composite java objects

This works so far for simple java objects but a XML is usually mapped to a composite tree of java objects. So what to do for the following XML fragment ?

    ...
       <fooHolder>
          <name>first holder</name>
          <foo id="123">
             <bar>blah</bar>
          </foo>
       </fooHolder>
    ...         

Here we have an object FooHolder that have a name and contains a Foo. The idea of X-O lite is to transfer the parsing control from object to object:

  • the FooHolder is being parsed and receives the parsing event
  • At one point, the FooHolder receives a startEvent for a Foo. At this point, it creates a Foo object and tells the parser that it is now this foo object that is parsed by calling the delegateParsingTo method.
  • The Foo object is parsed as described in preceding chapter.
  • When the last element of the Foo object is reached, the control of the parsing is given back to the FooHolder object.
  • the FooHolder receives the parsing event until it is itself terminated.

In the example described here, the sequence of method called on the FooHolder XMLSerializable when parsing will be:

  • startElement for fooHolder (parser.isFirstEvent() == true)
  • startElement for name
  • endElement for name
  • startElement for foo (in this method, the FooHolder creates a Foo and delegates parsing to it)
  • endElement for foo
  • endElement for fooHolder (parser.isLastEvent() == true)

You can see that the startElement and endElement corresponding to the foo and /foo XML tags are sent to both the FooHolder and the Foo objects. But the content of the Foo object is not notified to the FooHolder. This maps well to the Object Oriented principles where objects only see the interface and not the data of the objects they contain.

The code necessary to parse the FooHolder is:

 public class FooHolder implements XMLSerializable {


    private String name;
    private Foo foo;


    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("foo")) {
            foo = new Foo();
            parser.delegateParsingTo(foo);
        }
    }


    public void endElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("name")) {
            name = ElementText.getMandatoryString(parser);
        }
    }


    (...)

 }

Of course deciding how your XML is mapped to objects is your choice. In the current example choosing to map the proposed fragment to only one FooHolder class (without inner Foo) that have a name, an id and a bar fields is also a perfectly valid choice. In this case all the parsing code would be in the FooHolder class.

From the root

Using this method, all the objects are created by their parent object during the parsing process. The only object remaining is the root one.

The root is usually created by the code initiating the parsing. Most of the time, when you parse a XML you know what is the root element of this XML, so you can instantiate the corresponding object and give it to the XMLEventParser.

In this example we assume that the root XML object will be a FooHolder:

        
        SaxXMLEventParser parser = new SaxXMLEventParser();
        FooHolder root = new FooHolder();
        InputSource xmlSource = ...
        parser.parse(xmlSource, root);
                

If you cannot tell in advance what the XML root element will be, then you have to use a XMLObjectFactory (see how below)

Parsing collections

Parsing collections is no exception, it's done exactly the same way.

Let's say that you have a collection of FooHolder called FooCollection

 <fooCollection name="my collection">
    <holders>
       <fooHolder>
          <name>first holder</name>
          <foo id="123">
             <bar>blah</bar>
          </foo>
       </fooHolder>
       ... (more fooHolders)
       <fooHolder>
          <name>first holder</name>
          <foo id="123">
             <bar>blah</bar>
          </foo>
       </fooHolder>
    <holders>
 <fooCollection>

When parsing your FooCollection, each time you startElement for a fooHolder is notified, you create a FooHolder class, put in your favorite java collection (List, Map, Set ...) and delegate parsing to it.

 public class FooCollection implements XMLSerializable {


    private List<FooHolder> holders=new ArrayList<FooHolder>();
    private String name;
    

    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("fooCollection")) {
            name = Attributes.getMandatoryString("name", parser);
            holders.clear();
        } else if (localName.equals("fooHolder")) {
            FooHolder holder = new FooHolder();
            parser.delegateParsingTo(holder);
            holders.add(holder);
        }
    }

    (...)

 }

Here we used a List as inner collection to store the holders, we could have used a Map or a Set. In this case, we must take care to the moment when the object is added to the Map or the Set. This is because we usually cannot put the object in a Map or a Set before it is fully parsed:

  • For the Map, the attributes that will be used to build the key must be parsed before the object can be put in the map.
  • For the Set (and specially the HashSet) the object must be fully parsed before to be put in the set. Otherwise we cannot compare it to the object already inside the set (and we cannot calculate its hashcode).

The event-based parsing can be misleading in this case. For example if I try to store my FooHolders in a Map like that:

 public class FooCollection implements XMLSerializable {


    private Map<String, FooHolder> holders=new HashMap<String, FooHolder>();
    private String name;
    

    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("fooCollection")) {
            name = Attributes.getMandatoryString("name", parser);
            holders.clear();
        } else if (localName.equals("fooHolder")) {
            FooHolder holder = new FooHolder();
            parser.delegateParsingTo(holder);
            holders.put(holder.getName(), holder); // NOT WORKING !!!
        }
    }

    (...)

 }

It will not work, because at the time you try to put the holder into the map, the parse has just called one one startElement method on it for the <fooHolder> element. As, at this time, the startElement and endElement for the <name> has not yet been called. So, the holder name is still null.

One quick solution is to put the 'name' as attribute of the <fooHolder> element (so it is available just after the delegateParsingTo method is called) but it is not always possible (and still, it won't work for HashSets). The definitive solution is to put the object in your collection only after it is completely parsed = when you receive a endElement event for it.

So the correct implementation (without changing the XML) is:

 public class FooCollection implements XMLSerializable {


    private Map<String, FooHolder> holders=new HashMap<String, FooHolder>();
    private String name;
    

    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("fooCollection")) {
            name = Attributes.getMandatoryString("name", parser);
            holders.clear();
        } else if (localName.equals("fooHolder")) {
            parser.delegateParsingTo(new FooHolder());
        }
    }

    public void endElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("fooHolder")) {
            FooHolder holder = (FooHolder)parser.getLastParsedObject();
            holders.put(holder.getName(), holder);
        }
    }

    (...)

 }

Recursion

Parsing recursive XML structure is still no exception, it comes naturally with X-O lite.

The only attention point here is that if an XML element can contain itself, the objects must check the parser.isFirstEvent() method to differentiate between the tag mapped to the object itself or a tag of a sub-element.

Take the following recursive XML:

        ...
           <directory name="temp">
              <directory name="bin">
                 <file name="run.sh" />
              </directory>      
              <directory name="docs">
                 <file name="chiliRecipe.odt" />
                 <file name="myCV.odt" />
              </directory>      
              <file name="sssggh.dat" />
           </directory>      
        ...

The directory object implementation will look like:

 public class Directory implements XMLSerializable {


    private String name;
    private List<Directory> subDirectories = new ArrayList<Directory>();
    private List<File> files = new ArrayList<File>();


    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("directory")) {
            if (parser.isFirstEvent()) {
                name = Attributes.getMandatoryString("name", parser);
                subDirectories.clear();
                files.clear();
            } else {
                Directory subDir = new Directory();
                subDirectories.add(subDir);
                parser.delegateParsingTo(subDir);
            }
        } else if (localName.equals("file")) {
            File f = new File();
            files.add(f);
            parser.delegateParsingTo(f);
        }
    }


    public void endElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
    }

    (...)
 
 }      

When startElement is called for a directory, if it's the first event it means that it's the XML element corresponding to the object itself that is started, so we initialize the collections and get the name from attribute. If it is not the first event, it means that is the start of a directory element inside this element, hence we create a sub-directory object and delegate parsing to it.

XML object factory

In Object-Oriented programs, most of the time, the objects doesn't know the exact class of the objects they contain (or, more generally, the objects around them) but just their interfaces. When this is the case, it means that the one object cannot instantiate a sub-object before delegating parsing to it. The instantiation must be delegated to a factory that will hide the concrete type of created objects behind an interface.

X-O lite provides a factory interface (and concrete implementation) to handle this pattern. The interface is called XMLObjectFactory and define just a single method to instantiate an object on the basis of the information available in a startElement method.

        
        public interface XMLObjectFactory {

        public XMLSerializable createObject(String namespaceUri, String localName, XMLEventParser parser) 
            throws XMLParseException;

    }
        

The XMLEventParser provides a getter and setter for this factory. The factory is null by default, so if you want to use it, you have first to provide an implementation of the factory before starting to parse. Then, while parsing, you can call at any time parser.getFactory().createObject(uri, localName, parser) to create a XMLSerializable object.

Imagine that we add new type of files in the directory/file example above. All the files extends the File class and they can add some data specific to the file type in the XML. For example here, we add an ImageFile that put the image width and height in the XML and a ZipFile the put the list of entries in the XML:

     
 <?xml version="1.0" ?>
 <directory name="temp" xmlns="xo-lite.sf.net/site/examples/fileSystem">
        <directory name="bin">
                <file name="run.sh" />
                <zipFile name="logs.zip>
                        <entry name="bootstrap.log" />
                        <entry name="error.log" />
                        <entry name="debug.log" />
                </zipFile>
        </directory>
        <directory name="docs">
                <file name="chiliRecipe.odt" />
                <imageFile name="me.jpeg" width="1024" height="800" />
        </directory>
        <file name="sssggh.dat" />
 </directory>      
     

You can see that the file objects can have different XML structure ("simple" element with only attributes or complex element with sub-elements). Of course you don't want to put in the Directory object the knowledge of all the File sub-types and their structure. All a directory has to know is that it contains files. This is easy with the XMLObjectFactory, the Directory startElement method becomes even simpler:

    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("directory")) {
            if (parser.isFirstEvent()) {
                name = Attributes.getMandatoryString("name", parser);
                subDirectories.clear();
                files.clear();
            } else {
                Directory subDir = new Directory();
                subDirectories.add(subDir);
                parser.delegateParsingTo(subDir);
            }
        } else {
            File f = (File)parser.getFactory().createObject(uri, localName, parser);
            parser.delegateParsingTo(f);
            files.add(f);
        }
    }

Of course, you have to define an XMLObjectFactory. Here we use the provided ClassObjectFactory provided implementation. It's a default implementation of XMLObjectFactory. You just have to give it the mappings (uri + localName) --> class name, and it will create the class using the default (empty) constructor.

The code to setup the parser and launch the parsing is:

    public void run() throws Exception {
        SaxXMLEventParser parser = new SaxXMLEventParser();
        ClassObjectFactory factory=new ClassObjectFactory();
        factory.defineElement(Directory.FILE_SYSTEM_NAMESPACE, "file", File.class);
        factory.defineElement(Directory.FILE_SYSTEM_NAMESPACE, "imageFile", ImageFile.class);
        factory.defineElement(Directory.FILE_SYSTEM_NAMESPACE, "zipFile", ZipFile.class);
        parser.setFactory(factory);
        InputSource src = new InputSource(getClass().getResourceAsStream("DirectoryAndFilesExample.xml"));
        Directory dir = new Directory();
        parser.parse(src, dir);
        System.out.println("Got directory: " + dir);
        ConsoleXMLSerializer.dump(dir);
    }

Factory for root

If you don't know what is exactly the root element of your XML file, you can also rely the factory to instantiate it. Just call the 'parse' method without root object on your XMLEventParser to have one created by the parser factory.

TODO: example

Validation

In most of the examples above, the proposed code doesn't worry about input validation. There's a good reason for that! It is not the job of the XML to java Object mapping code to perform the input validation. We can delegate validation to the underlying XML parser using an XML schema.

Writing a schema for your data is a good idea, not only for validation. It acts as a commonly understood documentation. It will be useful for developer understanding of the XML content but also for tools like XML editors, test data generators, diagrams generators ...

Setting up an XMLEventParser to validate input using a schema is straightforward if you use the Sun JRE default parser (Apache Xerces). Just create a string containing the schema URI and an URL pointing to the actual schema location on your system (separated by a whitespace), give it to the SaxXMLEventParser constructor and you're done.

For other parsers, see your parser specific documentation to learn how to configure a validating XMLReader and pass it (fully configured) to the SaxXMLEventParser constructor.

You can see working validating parser setup in the examples.

What to validate

The only thing an object should validate is its own state. So, only validate what is important for parsed object = the object contract (a.k.a. object invariant). This contract can go from the simple rule saying that some fields are not null to complex business rules based on the state of many fields (or even many objects).

A good place to check those rules is when the object finished it's parsing. This event is easy to catch because in endElement the method parser.isLastEvent() tells you if the parsing of the object is finished. So you validating hook is:


    public void endElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (parser.isLastEvent()) {
            // validate here
        }
    }

Without XML schemas

If, for some reason, you have to work without XML schema, you can easily add some validation checks but a complete validation is almost impossible to reach without schema.

A simple validation that can be helpful (when working without schema) is to check the namespace URI and all the possible element tags that can be notified to the object. If unexpected URI or local names are found, you can use the XMLEventParser exception support to throw meaningful exceptions. Checking the attributes can also be done by iterating on all the defined attributes but this becomes quickly very cumbersome.

Simple element tag checking allows to catch all 'typo' errors in element names and most of the XML structure errors. Those errors would have been otherwise silently ignored by the (non-validating) parsing.

For example, is I take back the Foo class above. The same code with validation looks like:

 public class Foo implements XMLSerializable {

    public static final String FOO_NAMESPACE_URI="xo-lite/examples/foo";

    private int id;
    private String bar;


    public void startElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (!FOO_NAMESPACE_URI.equals(uri)) parser.throwUnexpectedNamespaceException(FOO_NAMESPACE_URI);
        if (localName.equals("foo")) {
            id = Attributes.getMandatoryInt("id", parser);
        } else if (localName.equals("bar")) {
            // nothing to do
        } else {
            parser.throwUnexpectedElementException("'foo' or 'bar'");
        }
    }
    

    public void endElement(String uri, String localName, XMLEventParser parser) throws XMLParseException {
        if (localName.equals("foo")) {
            // nothing to do
        } else if (localName.equals("bar")) {
            bar = ElementText.getString("bar_undefined", parser);
        } else {
            parser.throwUnexpectedElementException("'foo' or 'bar'");
        }
    }

    (...)

 }

The point is that you have write in both startElement and endElement methods a big if-elseif-...-else statement checking all possible tags (event when you have nothing to do with them) received by the objects and throwing an exception in the last else.

General rules for parsing

Here are some rules that applies to X-O lite parsing.

  • The parser holds all the parsing state: Your objects should not contain variables that are used only during parsing. If you do, consider to parse your XML to smaller objects.
  • Leave validation to the schema: do not try to validate the XML while parsing but write a W3C schema to define you data structure (unless you have a very good reason).
  • Some tags are totally ignored: Some XML tags are there just for XML readability (like the <holder> tag in fooCollection above). They can be ignored while parsing, the XML schema will take be the only to take care of them.
  • Attributes are handled in startElement while element text are handled in endElement.

From java objects to XML

Generating XML from the object (a.k.a. serializing or marshalling) the a invert operation of parsing, X-O lite emphasis the symmetry between the two operations. So to generate XML the object will trigger SAX-like events on an interface. Fortunately, event driven model is very easy when you are on the 'event generator' side because you control the application flow.

The interface receiving those SAX-like events is XMLSerializer. So, to be able to serialize a XMLSerializable object to XML, just implement it's serialize methods by calling the correct SAX-like events on the XMLSerializer received as parameter.

The principal methods available the XMLSerializer interface are:

    public void startDocument() throws XMLSerializeException;
    public void startPrefixMapping(String prefix, String namespaceUri) throws XMLSerializeException;
    public void startElement(String namespaceUri, String localName) throws XMLSerializeException;
    public void endElement(String namespaceUri, String localName) throws XMLSerializeException;
    public void characters(String text) throws XMLSerializeException;
    public void endDocument() throws XMLSerializeException;

    public void attribute(String localName, String value) throws XMLSerializeException;
    public void attribute(String namespaceUri, String localName, String value) throws XMLSerializeException;

    (...)
         

If you know SAX, most of the methods of XMLSerializer are familiar to you because they mimic the ContentHandler SAX interface. You have to call them on the XMLSerializer in the order they would have been called on the ContentHandler if the corresponding XML fragment is parsed by a regular SAX parser.

The biggest difference is the attribute methods used to notify attributes. They are not present in SAX ContentHandler because, unlike for the elements, SAX uses a pull interface for attributes. The usage of those methods is nevertheless straightforward: Just call attibute(..) for each attribute just after startElement and before any characters or endElement.

The startDocument end endDocument are not called by the XMLSerializale objects because an XMLSerializable object should not know if it is the root of the XML tree or if it is re-used inside another object. Usually, those methods are called by the XMLSerializer implementation so you don't have to worry about them.

The startPrefixMapping method is optional. If you don't call it, the XMLSerializer will choose a prefix for you when needed. If you call it, the first to request for a prefix mapping wins and all subsequent calls to map the same URI or prefix are ignored. So you can call it from your XMLSerializale to associate your object uri to your favorite prefix (just like, most of the time, the "http://www.w3.org/2001/XMLSchema" URI is associated to the "xs" prefix)

Simple serialization

Lets write the serialize method for our Foo class already used above.

         

    public void serialize(XMLSerializer serializer) throws XMLSerializeException {
        serializer.startPrefixMapping("f", FOO_NAMESPACE_URI);
        serializer.startElement(FOO_NAMESPACE_URI, "foo");
        serializer.attribute("id", String.valueOf(id));
        if (bar!=null) serializer.simpleElement(FOO_NAMESPACE_URI, "bar", bar);
        serializer.endElement(FOO_NAMESPACE_URI, "foo");
    }
         

Here we tell the serializer to preferentially use "f" as prefix for the used namespace. It will be used if nobody mapped the namespace to another prefix before.

The method simpleElement is just a shorthand for startElement + characters + endElement.

With that implementation, to serialize a Foo, just instantiate a XMLSerializer concrete class (for example StreamXMLSerializer to serialize in a stream) and ask it to serialize your object.

Example:

         
        FileOutputStream out = new FileOutputStream("/Temp/FooTest.xml");
        StreamXMLSerializer serializer = new StreamXMLSerializer(out);
        Foo foo = new Foo(123, "Hello");
        serializer.serializeObject(foo);
         

This code serializes the created Foo object into a little XML file on the local drive.

Serialization of composite

If a XMLSerializable contains other XMLSerializable, the serialize method implementation stays very simple, just call, at the right moment the serialize on the subObjects.

For example the serialize of the FooHolder method will be:

         
    public void serialize(XMLSerializer serializer) throws XMLSerializeException {
        serializer.startPrefixMapping("f", Foo.FOO_NAMESPACE_URI);
        serializer.startElement(Foo.FOO_NAMESPACE_URI, "fooHolder");
        if (name != null) serializer.simpleElement(Foo.FOO_NAMESPACE_URI, "name", name);
        if (foo != null) foo.serialize(serializer);
        serializer.endElement(Foo.FOO_NAMESPACE_URI, "fooHolder");
    }