Sunday, May 8, 2011

Validating an XML document with an ISO-Schematron schema on OSX

Schema languages make sure your XML documents are valid. But not all schema languages are equal. ISO-Schematron is great but validation isn't as simple as right-clicking on your XML document in Eclipse and clicking Validate - Eclipse doesn't support ISO-Schematron natively!.

A good free and open-source XSLT and XQuery toolkit is Saxon. You can use the XSLT processor in Saxon to validate XML documents against ISO-Schematron schemas. There are three editions of Saxon - HE (Home Edition), PE (Professional Edition) and EE (Enterprise Edition). For validating the occasional document, Saxon-HE is fine. As of writing, the latest available is 9.3 which you can download here (saxonhe9-3-0-4j.zip).

Here are a few tips for first time Saxon users.

  1. When you've finished downloading Saxon, don't unzip it with OSX's built in Archive Utility. There is a bug in Apple's Archive Utility that affects the way Java .jar files are handled (it "helpfully" extracts the contents of the .jar file). Unzip it with StuffIt Expander (available in the Apple App Store).

  2. The command line usage of Saxon on OSX is slightly different to Windows. The command

    java -jar saxon9he.jar -o output.xsl -s mySchema.sch iso_svrl_for_xslt2.xsl

    will work in Windows but will throw an error in OSX ("Command line option -o requires a value"). You'll need your command slightly: add a colon after the -o operator and remove the space.

    java -jar saxon9he.jar -o:output.xsl -s:mySchema.sch iso_svrl_for_xslt2.xsl

Now, let's test an XML document against an ISO-Schematron schema.

Saxon doesn't simply spit out a 'Your XML document makes no sense' report. It takes two steps to validate an XML document against an ISO-Schematron schema. Step one involves three files.
  1. iso_schematron_skeleton_for_saxon.xsl - this contains the ISO-Schematron schema definition/rules of war! This comes with Saxon.
  2. iso_svrl_for_xslt2.xsl - SVRL is the Schematron Validation Report Language. It prepares your report and shows where you screwed up.
  3. mySchema.sch - this is the schema you've written. It defines valid content.
We need to transform mySchematron.sch with iso_svrl_for_xslt2.xsl to create yet another XSL (let's call it output.xsl). To do this, execute the following command

java -jar saxon9he.jar -o:output.xsl -s:mySchema.sch iso_svrl_for_xslt2.xsl

If your schema was invalid, you'll get a Transformation failed: Run-time errors were reported message. Luckily, the error message is verbose and will tell you what line in your schema is invalid and why. If your schema was valid, you'll now have a output.xsl file.

Now, this output.xsl file you've generated is special. It contains a mashup of your schema with the SVRL. If you transform any XML document with the output.xsl file, you'll get an XML report detailing whether it is valid against your schema! How clever! Let's transform myXMLDocument.xml.

java -jar saxon9he.jar -o:whereDidIScrewUp.xml -s:myXMLDocument.xml output.xsl

In this case, whereDidIScrewUp.xml is the validation report of your XML document.

I hope this helps! I might write a Eclipse plugin if I get some time.

UPDATE: I've been beaten to the punch! Castle Systems have released Schematron-EP (Eclipse Plugin). I've yet to test it.

No comments:

Post a Comment