I decided to post this after an interesting journey to build an XML parsing component that would allow me to produce really good validation messages, better during deserialization but also for post-processing those kinds of rules you can’t express in schema. So, typically if I want to get good error/validation with location information I would normally reach for SAX the Simple API for XML and build a ContentHandler to handle the events from the SAX parser. In Java I can ask for the setDocumentLocator method to provide a locator object that allows me to determine the line/column in the XML content at any time during my handler.

So, what’s the problem? well two things, I cannot use this generically as each content handler has to be really coded to the type I want to deserialize, and SAX handlers can get complex to write and maintain. So, what about JAX-B which provides a nice simple API to deserialize XML directly into a Java object model? Well the problem is that while the implementation does usually provide location information for serialization errors not all errors that could be caught by JAX-B are caught and so we need to post-process and at that time we no longer have any location information, just our object model. To gather up all the errors generated by JAX-B we need to provide an implementation of the ValidationEventHandler and pass it to our unmarshaller using the method setEventHandler. Our implementation is pretty simple (and a commonly documented pattern) and only records all validation events into a list, ensuring that the handleEvent method always returns false or the parser will assume the error is fatal and stop parsing. Note that we actually transform the JAX-B ValidationEvent into our own ValidationError class, more on this later.

So, now we have captured all the validation events we need somehow to persist the line/column information for our object model so that when we do additional validation/consistency checking we can provide good, meaningful errors. Ideally we want to be able to record the location information outside the object model, we don’t want to have to add a line and column field to each model object. Our solution is simple, we keep a Map of parsed object to Location so that we know the line/column of the start of the XML element that was used to deserialize the object. But how? After all JAX-B is pretty much a black-box, we ask it to deserialize an input source and we give it the expected type of the root object. For a start JAX-B does allow you to peek into the process via the setListener method which takes an instance of the Listener interface and allows you to take action on beforeUnmarshal and afterUnmarshal events. Interestingly though JAX-B has a mechanism whereby you can actually retrieve from your Unmarshaller the internal SAX handler used by JAX-B and then use the SAX API to drive the deserialization. The advantage of this is that we can wrap the JAX-B handler in our own handler before passing it to the SAX parser, overriding any SAX events we want. In actual fact we only need to override the setDocumentLocator call and keep a reference to the Locator object. The resulting class, DelegatingHandlerImpl extends the JAX-B Listener and implements the JAX-B UnmarshallerHandler (which in turn extends the SAX ContentHandler); the code for our handler is shown below and basically by using the before unmarshal method and the SAX locator we can build an internal map that tracks the location of each parsed object.

class DelegatingHandlerImpl extends Listener implements UnmarshallerHandler {
 
    private final UnmarshallerHandler unmarshallerHandler;
    private Locator locator;
    private final Map<Object, LocationImpl> locationMap;
     
    /**
     * {@inheritDoc}
     */
    @Override
    public void setDocumentLocator(Locator locator) {
        this.locator = locator;
        this.unmarshallerHandler.setDocumentLocator(locator);
    }
 
    /**
     * {@inheritDoc}
     */
    @Override
    public void beforeUnmarshal(Object target, Object parent) {
        super.beforeUnmarshal(target, parent);
        if (target != null && this.locator != null) {
            this.locationMap.put(target, new LocationImpl(this.locator.getLineNumber(), this.locator.getColumnNumber()));
        }
    }

The parser code is shown below, it is more complex than you would expect to see if you are used to SAX and certainly more than you would expect of JAX-B but mostly we’re just setting up the JAX-B/SAX combination and plugging in our two additional classes, the DelegatingHandlerImpl (line 21) and ValidationEventHandlerImpl (line 24). We construct both the list of errors (line 4) and the object to location map (line 5) in the parser so that we can make them available to the caller after parsing is complete.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public T parse(final InputSource input, final String schemaPath, final Class classOfT)
        throws ParserConfigurationException, IOException {
    this.result = null;
    this.events = new LinkedList<ValidationError>();
    this.locationMap = new HashMap<Object, LocationImpl>();
    try {
         
        // Standard JAX-B
        final JAXBContext context = JAXBContext.newInstance(classOfT);
        final Unmarshaller unmarshaller = context.createUnmarshaller();
        // Setup schema validation if required
        if (schemaPath != null) {
            final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
            final InputStream schemaIS = ClassLoader.getSystemResourceAsStream(schemaPath);
            final Schema schema = sf.newSchema(new StreamSource(schemaIS));
            unmarshaller.setSchema(schema);
        }
        // Now retrieve the SAX handler that JAX-B uses
        final UnmarshallerHandler unmarshallerHandler = unmarshaller.getUnmarshallerHandler();
        // Wrap it in our own handler
        final DelegatingHandlerImpl actualHandler = new DelegatingHandlerImpl(unmarshallerHandler, this.locationMap);
 
        // Now create and add an error handler
        final ValidationEventHandlerImpl errorHandler = new ValidationEventHandlerImpl(this.events);
        unmarshaller.setEventHandler(errorHandler);
        // Add a listener for before/after unmarshall events
        unmarshaller.setListener(actualHandler);
 
        // Now setup SAX
        final SAXParserFactory spf = SAXParserFactory.newInstance();
        spf.setNamespaceAware(true);
         
        // Start the SAX parser but using *our* new handler
        final XMLReader xmlReader = spf.newSAXParser().getXMLReader();
        xmlReader.setContentHandler(actualHandler);
        xmlReader.parse(input);
 
        // Retrieve the result from the handler, note that this is actually
        // the bridge back to JAX-B
        this.result = (T)unmarshallerHandler.getResult();
         
    } catch (UnmarshalException ex) {
        // ignore, these are reported in the validation errors.
    } catch (JAXBException ex) {
        this.events.add(new ValidationErrorImpl(Severity.FATAL, "JAX-B configuration exception", ex));
    } catch (SAXException ex) {
        this.events.add(new ValidationErrorImpl(Severity.FATAL, "IO error reading from InputSource", ex));
    }
     
    return this.result;
}

I’ve posted all the source for the parser implementation as well as some basic (for now) test cases on Google Code Github in the validating-jaxb repo.