The joy of JAX-B and SAX
I decided to post this after an interesting journey to build an XML parsing component that would allow me to produce
really good validation messages, better during deserialization but also for post-processing those kinds of rules you
can’t express in schema. So, typically if I want to get good error/validation with location information I would normally
reach for SAX the Simple API for XML and build a ContentHandler
to handle the events
from the SAX parser. In Java I can ask for the setDocumentLocator
method to provide a locator object that allows me to
determine the line/column in the XML content at any time during my handler.
So, what’s the problem? well two things, I cannot use this generically as each content handler has to be really coded to
the type I want to deserialize, and SAX handlers can get complex to write and maintain. So, what about JAX-B which
provides a nice simple API to deserialize XML directly into a Java object model? Well the problem is that while the
implementation does usually provide location information for serialization errors not all errors that could be caught
by JAX-B are caught and so we need to post-process and at that time we no
longer have any location information, just our object model. To gather up all the errors generated by JAX-B we need to
provide an implementation of the ValidationEventHandler
and pass it to our unmarshaller using the method
setEventHandler
. Our implementation is pretty simple (and a commonly documented pattern) and only records all
validation events into a list, ensuring that the handleEvent
method always returns false or the parser will assume the
error is fatal and stop parsing. Note that we actually transform the JAX-B ValidationEvent
into our own
ValidationError
class, more on this later.
So, now we have captured all the validation events we need somehow to persist the line/column information for our object
model so that when we do additional validation/consistency checking we can provide good, meaningful errors. Ideally we
want to be able to record the location information outside the object model, we don’t want to have to add a line
and
column
field to each model object. Our solution is simple, we keep a Map
of parsed object to Location
so that we
know the line/column of the start of the XML element that was used to deserialize the object. But how? After all JAX-B
is pretty much a black-box, we ask it to deserialize an input source and we give it the expected type of the root
object. For a start JAX-B does allow you to peek into the process via the setListener method which takes an instance of
the Listener
interface and allows you to take action on beforeUnmarshal
and afterUnmarshal
events. Interestingly
though JAX-B has a mechanism whereby you can actually retrieve from your Unmarshaller
the internal SAX handler used
by JAX-B and then use the SAX API to drive the deserialization. The advantage of this is that we can wrap the JAX-B
handler in our own handler before passing it to the SAX parser, overriding any SAX events we want. In actual fact we
only need to override the setDocumentLocator
call and keep a reference to the Locator
object. The resulting class,
DelegatingHandlerImpl
extends the JAX-B Listener and implements the JAX-B UnmarshallerHandler
(which in turn extends
the SAX ContentHandler
); the code for our handler is shown below and basically by using the before unmarshal method
and the SAX locator we can build an internal map that tracks the location of each parsed object.
class DelegatingHandlerImpl extends Listener implements UnmarshallerHandler {
private final UnmarshallerHandler unmarshallerHandler;
private Locator locator;
private final Map<Object, LocationImpl> locationMap;
/**
* {@inheritDoc}
*/
@Override
public void setDocumentLocator(Locator locator) {
this.locator = locator;
this.unmarshallerHandler.setDocumentLocator(locator);
}
/**
* {@inheritDoc}
*/
@Override
public void beforeUnmarshal(Object target, Object parent) {
super.beforeUnmarshal(target, parent);
if (target != null && this.locator != null) {
this.locationMap.put(target, new LocationImpl(this.locator.getLineNumber(), this.locator.getColumnNumber()));
}
}
The parser code is shown below, it is more complex than you would expect to see if you are used to SAX and certainly
more than you would expect of JAX-B but mostly we’re just setting up the JAX-B/SAX combination and plugging in our two
additional classes, the DelegatingHandlerImpl
(line 21) and ValidationEventHandlerImpl
(line 24). We construct
both the list of errors (line 4) and the object to location map (line 5) in the parser so that we can make them
available to the caller after parsing is complete.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public T parse(final InputSource input, final String schemaPath, final Class classOfT)
throws ParserConfigurationException, IOException {
this.result = null;
this.events = new LinkedList<ValidationError>();
this.locationMap = new HashMap<Object, LocationImpl>();
try {
// Standard JAX-B
final JAXBContext context = JAXBContext.newInstance(classOfT);
final Unmarshaller unmarshaller = context.createUnmarshaller();
// Setup schema validation if required
if (schemaPath != null) {
final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
final InputStream schemaIS = ClassLoader.getSystemResourceAsStream(schemaPath);
final Schema schema = sf.newSchema(new StreamSource(schemaIS));
unmarshaller.setSchema(schema);
}
// Now retrieve the SAX handler that JAX-B uses
final UnmarshallerHandler unmarshallerHandler = unmarshaller.getUnmarshallerHandler();
// Wrap it in our own handler
final DelegatingHandlerImpl actualHandler = new DelegatingHandlerImpl(unmarshallerHandler, this.locationMap);
// Now create and add an error handler
final ValidationEventHandlerImpl errorHandler = new ValidationEventHandlerImpl(this.events);
unmarshaller.setEventHandler(errorHandler);
// Add a listener for before/after unmarshall events
unmarshaller.setListener(actualHandler);
// Now setup SAX
final SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
// Start the SAX parser but using *our* new handler
final XMLReader xmlReader = spf.newSAXParser().getXMLReader();
xmlReader.setContentHandler(actualHandler);
xmlReader.parse(input);
// Retrieve the result from the handler, note that this is actually
// the bridge back to JAX-B
this.result = (T)unmarshallerHandler.getResult();
} catch (UnmarshalException ex) {
// ignore, these are reported in the validation errors.
} catch (JAXBException ex) {
this.events.add(new ValidationErrorImpl(Severity.FATAL, "JAX-B configuration exception", ex));
} catch (SAXException ex) {
this.events.add(new ValidationErrorImpl(Severity.FATAL, "IO error reading from InputSource", ex));
}
return this.result;
}
I’ve posted all the source for the parser implementation as well as some basic (for now) test cases on Google
Code Github in the validating-jaxb repo.