Sunday, July 22, 2007


About a year ago, I published a white paper entitled: "Beyond S1000D: an SOA Enabled Interoperability Framework for the Aerospace Industry".

The white paper proposed a framework called "Integrated Documentation Environment for Aircraft Support (IDEAS)" for the interoperability of enterprise content management and publishing systems within the aerospace industry. The goal was to allow new capabilities such as the remote access to library services, cross-repository exchange, cross-repository aggregation, and cross-repository observation.

Global aerospace organizations acquire technical publications from multiple suppliers and business partners. They must address the following challenges:

  • The elimination of the high costs associated with paper libraries and the shipping of physical products such as paper, CDs, and DVDs.

  • The safety and regulatory compliance concerns related to the slow distribution of supplements to field sites.

  • The need for a single point of access to the multitude of technical documentation needed to maintain and operate aerospace equipments.

The IDEAS concept was created to address current inefficiencies in technical data management processes within the industry by taking advantage of Service-Oriented Architecture (SOA) and emerging content management standards such as the JSR 170 Content Repository for Java Technology API.

One the Java EE platform, JSR 170 is enjoying a lot of success in terms of adoption and implementation. In the Open Source world, the Apache Jackrabbit project continues to evolve and there is now a Spring JSR 170 Module to simplify development with the very popular Spring Framework.

For cross-platform interoperability, SOA based solutions have traditionally relied on web services standards such as SOAP, WSDL, and UDDI. However, in today's Web 2.0 world, alternative approaches such as the Representational State Transfer (REST) architectural style and the OpenSearch specification (for federated searches) are getting a lot of attention for their simplicity and scalability.

REST is based on the notion that resources on the web are URI-addressable and that all CRUD (Create, Retrieve, Update and Delete) operations on those resources can be implemented through a generic interface (e.g., HTTP GET, POST, PUT, DELETE). In contrast, RPC-based mechanisms such as SOAP use many custom methods and expose a single or few endpoint URIs. It turned out that the requirements for interoperable enterprise content management systems are more amenable to the REST architectural style.

The resurgence of REST can be felt across the application development landscape. Struts 2 introduced a REST-style improvement to action mapping called Restful2ActionMapper (itself inspired by the REST support in Ruby on Rails). Support for RESTful web applications is been added to JSF through the RestFaces project. REST APIs are also easy to implement with scripting languages such as JavaScript and FreeMarker.

The technical documentation needed to operate and maintain an airline's fleet is supplied by several manufacturers including aircraft, engine, and component manufacturers. Regulatory agencies ( FAA and the NTSB) also publish documents such as Advisory Circulars (ACs), Airworthiness Directives (ADs), and various forms and regulations. If all these organizations expose their content repositories via OpenSearch, then an airline technician will be able to perform a federated search across all those repositories to obtain technical information about particular equipment. The results could be formatted in ATOM to allow the technician to receive updates via web feed.

To expose a library service with a REST-style API, a content management system would typically need to provide the following:

  1. A description of the service including URI templates, HTTP method binding, authentication, transaction, response content types, and response status

  2. The specification of the code (script or Java ) that is executed on the invocation of the URI

  3. Response templates

JSR 311, the Java API for RESTful Web Services will define a set of Java APIs for the development of Web services built according to the REST architectural style.

Sunday, July 15, 2007

XProc: The "Maven" of XML Developers

XProc, the XML Pipeline Language, is currently a W3C working draft that could become for XML developers, what Apache Maven is currently for Java developers.

According to the specification:

"An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output."
Let's see how XProc can be used to generate an S1000D IETP from a collection of data modules. The process typically involves the following steps:

  1. Get the collection of applicable data modules to be included in the IETP by issuing a query (hopefully XQuery based) against the CSDB
  2. Include XML fragments using XInclude
  3. Validate data modules against S1000D XML Schemas
  4. Validate data modules against Schematron schemas
  5. Generate identifiers for elements that will be the targets for links
  6. Generate XLink attributes on elements that will be the sources of links
  7. Generate RDF or Dublin Core Metadata
  8. Transform the data modules from XML into HTML with XSLT
  9. Transform the data modules from XML into PDF using XSLT and XSL FO.
XProc can be used to wire all these steps together, so that they can be executed from a single command. XProc provides the following interesting steps:

  • p:xquery
  • p:xinclude
  • p:validate-relax-ng
  • p:validate-xml-schema
  • p:label-elements (with a unique xml:id)
  • p:rename (renames elements, attributes, or processing-instruction targets)
  • p:string-replace
  • p:xslt2
  • p:xsl-formatter
XProc also allows you to add logic with elements such as:

  • p:for-each
  • p:viewport
  • p:choose
  • p:group
  • p:try/p:catch
Norman Walsh has released an implementation called the XML Pipeline Processor and there is another implementation called Yax. One feature of Apache Maven POM files that I would like to see in XProc is dependency management.

Friday, July 13, 2007

S1000D Business Rules

Having been involved in the exchange and use of digital publications in the aerospace industry during the last ten years, I realize the importance of specifying well defined business rules and most importantly validating the XML documents against those business rules.

The S1000D TPSMG is currently reviewing two Change Proposal Forms (CPFs) that will help S1000D implementers in the area of business rules:
  • CPF-2007-048DE: Business Rules (BR) Categories and Layers
  • CPF-2006-033CA: Schematron for Business Rules

CPF-2007-048DE (written by Victoria Ichizli-Bartels and Mike Day) has proposed the breakdown of business rules into 10 categories as follows:
  1. General business rules
  2. Product definition business rules
  3. Maintenance philosophy and concepts of operations business rules
  4. Security business rules
  5. Business process business rules
  6. Data creation business rules
  7. Data exchange business rules
  8. Data integrity and management business rules
  9. Legacy data conversion, management, and handling business rules
  10. Data output business rules

CPF-2007-048DE also proposed the layering of S1000D business rules and will help implementers in creating a comprehensive and well organized set of business rules for their projects.

The second CPF, CPF-2006-033CA (proposed by myself and accepted for inclusion in S1000D 3.x) suggested ISO Schematron as the mechanism for exchanging and validating S1000D documents against business rules. While ISO Schematron cannot validate all S1000D project specific business rules (e.g. verifying that a paragraph is written according to the rules of Simplified English), it can certainly do an excellent job at providing very valuable reports and diagnostics information about the content of an XML document.

ISO Schematron declares assertions about arbitrary patterns in XML documents and then reports on the presence or absence of these patterns. Schematron schemas use XPath for specifying the node that is the subject of the assertion and for testing the assertion itself.

Very complex assertions can be expressed by using new XPath 2.0 constructs such as regular expressions, conditional expressions, sequence expressions, type expressions, and the extensive function library.

Today, the combined validation power of XML Schema and ISO Schematron and the query and data manipulation capabilities of XQuery have made the maxim "The document is the database" a reality.

Thursday, July 12, 2007

S1000D Core

The lack of extensibility in S1000D is cited as one of its main drawbacks and could be a deterrent to potential adopters. The main strength of the Darwin Information Typing Architecture (DITA) as compared to S1000D is its extensibility mechanism referred to as specialization.

One of the benefits of DITA specialization is that it not only allows users to extend the vocabulary to satisfy their unique needs, but it also enables the reuse of processing code (e.g. XSLT stylesheets) across specializations through a fall back mechanism to base types. The DITA specialization mechanism uses an elaborate scheme based on DTDs and XSLT 1.0.

S1000D should learn from DITA’s experience and success by providing an extensibility framework that allows any party to add extensions that are needed to satisfy their unique requirements. An S1000D extensibility framework will also reduce the number of Change Proposal Forms (CPFs) submitted to the TPSMG by allowing organizations and communities of interest to adopt S1000D without "polluting" the S1000D core specification.

The combination of XML Schema’s element substitution and type inheritance coupled with XSLT 2.0 schema-aware processing facility can provide a more robust extensibility mechanism for S1000D.

Efasoft has submitted a CPF (CPF_2007-006CA ) to the TPSMG to evaluate and implement such a framework.

Information quality in NIEM exchanges

The National Information Exchange Model (NIEM) is emerging as the standard for information sharing between government agencies. At the same time, the issue of information quality has been receiving significant attention. Data exchange initiatives based on the NIEM standard will increase the need for information quality in XML data exchanges. The government has the obligation to ensure information quality in XML exchanges not only for fulfilling its mission of protecting its citizens, but also for protecting citizens’ rights.

Ensuring information quality requires a multidimensional approach based on policy, process, technology, and governance. Standards-based user interface and data validation technologies such as XForms and ISO Schematron can help in improving the quality of the data at the point of user inputs into the systems participating in the exchange.

XForms is an XML application for next generation Web forms. It implements the model view controller (MVC) pattern by splitting forms into three parts: XForms model, instance data, and user interface. The benefits are: strong data typing and validation, less client-side scripting, and device independence. Compared to other modern MVC frameworks such as Struts 2 and Java Server Faces (JSF), XForms is a declarative solution: a complete application can be created without a single line of Java code.

ISO Schematron declares assertions about arbitrary patterns in XML documents and then reports on the presence or absence of these patterns. Schematron schemas are very useful for expressing and validating data exchange business rules since they can define constraints that are beyond the validation capabilities of XML Schemas.

XForms can be generated automatically from an exchange schema or from a WSDL file. The XForms application enforces the constraints expressed in the exchange schema by flagging error messages to the user when the value entered in a field is incorrect. In addition, Schematron rules can be applied to the XForms to enforce business rules on the data entered by the user. When the user submits the form, the application generates XML data that is valid against not only the exchange schema, but also the business rules defined by the exchange. For example, in the context of law enforcement, the interface can validate that the activity date and the subject birth date are valid dates based on the XSD definition. In addition, it will also ensure that the birth date comes at least 18 years before the activity date if the business rules prohibit entering juvenile data into the system. While Struts 2 and JSF have validation features, the combination of XForms and ISO Schematron offers much more powerful validation capabilities.

XForms and ISO Schematron are declarative languages. This allows non-programmers to contribute to the specification of the user interface and business rules for the project. An additional benefit is that the XForms can generate valid XML data directly upon form submission. This eliminates the challenges traditionally associated with mapping relational data to a NIEM compliant format. Today, most implementations use an XML binding framework such as Castor to unmarshall the XML data into Java objects and an Object Relational Mapping (ORM) framework such as Hibernate to persist the Java objects into relational tables. Such a mapping can get unwieldy.

Rather than shredding the XML data in relational tables, the XForms can be integrated with an XML database to store the data natively in XML. This will happen as native XML databases become more transactional and scalable.