XML and Java


This page will walk you through the basics of XML with Java

First we'll look at simple parsing of XML, then we'll look at saving and loading objects as XML.


In Java you have two major choices for reading and writing XML:

In addition, Stream-based Parsing is divided into:


Thus, Java has three broad APIs that match these divisions.


DOM (Document Object Model): javax.xml.parsers

A tree-based parsing API: You get a parser and set it up with an InputStream. Once it has read the XML you can get it as a Document. Once you have a Document, it is possible with methods like getElement and createElement to read and write to the XML stored in the program. The key class is DocumentBuilder. This is gained from a DocumentBuilderFactory which has various methods to set up the parser, including setValidating, if you wish to ensure the XML is well formed. (For writing DOM data to an actual XML file, see TrAX, below).


SAX (Simple API for XML): javax.xml.parsers

A stream and push/event-based parsing API: You build a handler that implements a set of interfaces, and register the handler with a parser (connecting the parser to an InputStream at the same time). When the parser hits an element it calls the relevant method. Key classes are SAXParser and DefaultHandler The former is gained from a SAXParserFactory which has various methods to set up the parser, including setValidating, if you wish to ensure the XML is well formed. (For writing SAX data to an XML file, see TrAX, below).


StAX (Streaming API for XML): javax.xml.stream

A stream-based pull-parsing API: You ask a parser for each new element, and then request its attributes. The key classes are XMLStreamReader XMLStreamWriter Though there are also slightly more event-based versions as well (details). The parsers are gained from a XMLInputFactory while the writer is gained from a XMLOutputFactory (Details and examples).


Other Java XML stuff:


Marshalling

Marshalling is the saving of java Objects as XML in a text file for later unmarshalling back to working Java objects. This is a bit like serialisation (the saving of objects to binary files) but somewhat more constrained.

JAXB (Java Architecture for XML Binding: javax.xml.bind):

Used for generating classes from XML schema, and marshalling objects of these classes to XML files / unmarshalling these back into objects again. The key thing to remember is that objects can only be saved if their data can be mapped to the core XML element types.

The key stages are:

1) Write an XML Schema representing the classes you want java to work with. This has to be just as java likes it, but there's actually very little information on what that form is. The examples on these webpages work. Other well-formed XML won't necessarily. You might want to start by adapting the XML here.

2) Use this schema, plus a java tool "xjc" to generate java files representing the schema. The file will be for a class to represent the root element, but contain classes for the sub-elements as well. Sub-element class objects will be stored in the root element object (or other sub-elements) in a List. Also produced will be an ObjectFactory which can be used to create specific objects from the classes if the objects aren't going to be read in from XML (for example, if you want to just write data out). Compile all these files.

3) Use the ObjectFactory in your own java to generate objects from the classes and then use their set/get methods to fill them with data, OR read in pre-existing objects stored in XML files using an unmarshaller - at which point you've created objects read in from XML.

4) If you want to write the objects out as java (for example, having changed the data), write the root-element object (plus any sub-element objects it contains) to an XML file using a marshaller.

Example: As StAX is relatively new, and fully-working JAXB examples that work in the way JDK1.6 suggests are relatively rare, here's an example application containing a fully worked example of both, which uses the XML found in these webpages : XMLExamples.java. It allows you to read objects in from XML and write objects out to XML. Run the code as described in the Docs, and look at how it works. Use these files: test.xml and test.xsd copied to the same directory as the java file.


Good sources

Processing XML with Java: things have moved on since it was written, however, other than the fact that many of the libraries are now in the javax packages, the examples still hold true.

XML and Java for Scientists/Engineers: Growing resource for Scientific XML and Java. Includes SVG data presentation.

The Java Web Services Tutorial: Sun's tutorial covering the newer elements of the XML processing packages.

For information on JAXB specifically, your best bet is the javax.xml.bind API, as most other examples are out of date. However, you can gleen alot from: Details and examples; Simple example; Simple example.

Also worth checking out is the Unofficial JAXB Guide which, for example, explains about dealing with cyclic references between classes.