Sunday, July 15, 2007

XProc: The "Maven" of XML Developers

XProc, the XML Pipeline Language, is currently a W3C working draft that could become for XML developers, what Apache Maven is currently for Java developers.

According to the specification:

"An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output."
Let's see how XProc can be used to generate an S1000D IETP from a collection of data modules. The process typically involves the following steps:

  1. Get the collection of applicable data modules to be included in the IETP by issuing a query (hopefully XQuery based) against the CSDB
  2. Include XML fragments using XInclude
  3. Validate data modules against S1000D XML Schemas
  4. Validate data modules against Schematron schemas
  5. Generate identifiers for elements that will be the targets for links
  6. Generate XLink attributes on elements that will be the sources of links
  7. Generate RDF or Dublin Core Metadata
  8. Transform the data modules from XML into HTML with XSLT
  9. Transform the data modules from XML into PDF using XSLT and XSL FO.
XProc can be used to wire all these steps together, so that they can be executed from a single command. XProc provides the following interesting steps:

  • p:xquery
  • p:xinclude
  • p:validate-relax-ng
  • p:validate-xml-schema
  • p:label-elements (with a unique xml:id)
  • p:rename (renames elements, attributes, or processing-instruction targets)
  • p:string-replace
  • p:xslt2
  • p:xsl-formatter
XProc also allows you to add logic with elements such as:

  • p:for-each
  • p:viewport
  • p:choose
  • p:group
  • p:try/p:catch
Norman Walsh has released an implementation called the XML Pipeline Processor and there is another implementation called Yax. One feature of Apache Maven POM files that I would like to see in XProc is dependency management.


spycomponents said...

Hi Joel!

Sure using XProc you can describe the process and have various possibilities to transform, validate and combine XML data.

However, I always wonder if it is not a question of tools I'm using for this? Do you want to use several different tools to do your XSLT transformation, validation and Relax-NG? Is there a single tool you can use for all what XProc supports?

Joel Amoussou said...

There are tools that provide you with a parser, an XSLT engine, and even an XQuery processor all in one tool.

Your XProc implementation should allow you to configure any tool of your choice for executing the steps in the pipeline. This is easily done with Apache Maven XML POM files and the dependencies are even automatically downloaded to a local repository. Maven can bring in the dependencies of those dependencies as well(transitive dependencies).

spycomponents said...

To configure any tool I would like to use means the XProc processor needs to either know all tools or the processor provides a clever layer to map the options and parameters to the tools specific ones (if it supports them at all).

Seems we need some XML standard for XML tools configuration. :-)