Saturday, May 24, 2008

S1000D Content Reuse for Aircraft Documentation

One of the justifications for moving to an XML-based S1000D content management system (CMS) is the ability to reduce cost and improve quality by reusing content. In the aerospace industry, hundreds of thousands of pages of maintenance and operation documentation are produced and maintained for every new aircraft project. Warnings and cautions are a good example of reuse in aerospace documentation. They describe hazards that may cause injury or death or damage to the aircraft. For product liability reasons, these warnings and cautions are carefully reviewed and approved by qualified personnel. Technical authors may be required to reuse these warnings and cautions verbatim across all documents. In this blog, I will discuss some principles and practices that facilitate S1000D content reuse.

From a technical perspective, the key to successful reuse in S1000D is the W3C XInclude specification. The S1000D specification does not make reference to XInclude. The reason is that earlier versions of S1000D were based on SGML. Some S1000D CMS still rely on the SGML/XML 1.0 external parsed entity mechanism for implementing reuse. This approach has several limitations and should be avoided. The preferred approach in modern XML content applications is to use XInclude which allows the transclusion of not only whole chunks of XML content, but also elements (addressed using XPath/XPointer) within those chunks. The following are some examples:

<xi:include href="dm.xml"/>
<xi:include href="dm.xml" xpointer="warning-001"/>

In the first example a data module file named dm.xml is included. In the second example, an element with ID value "waning-001" within the data module is included.

Using XInclude in an S1000D content application requires some modifications to the XML Schema used for the authoring of data modules to allow the insertion of xi:include elements. However, these modifications will still produce valid S1000D documents since you're not altering the structure of your documents, but rather simply modularizing the content.

While we are on the subject of inclusion, the XLink specification can be used as a simpler alternative to the XML 1.0 unparsed entity and notation mechanism (another concept inherited from SGML) for including illustrations into S1000D documents.

At the DocTrain 2007 conference in Boston, I gave a presentation on how to integrate training and documentation using S1000D and the Shareable Content Object Reference Model (SCORM) specification. One way to reuse S1000D content in SCORM is to assign a unique ID to all elements in S1000D data modules (DMs) that are reusable such as paragraphs, steps, warning, cautions, notes, tables, etc. This can be done automatically using the XSLT generate-id() function. The instructional designer then searches the S1000D common source database (CSDB) to find and display relevant DMs. She can then use XInclude to include reusable elements from S1000D DMs into SCORM shareabe content objects (SCOs). When this is done, the SCOs are automatically updated when the DMs are updated.

Successful S1000D reuse requires adherence to the principle of context-agnostic content. For example, to make it possible to reuse a warning across multiple documents in different contexts, one should avoid formulations such as "refer to the illustration in the next section" inside the warning.

Enforcing the principle of context-agnostic content can be semi-automated using an assertion-based schema language like ISO Schematron to report the occurrence of keywords such as "previous", "next", "below", etc. The warning shall be routed through a comprehensive review and approval workflow provided by the CMS before final publication. The principle of business rules definitions and enforcement ensures that reusable content is of the highest quality. Consider a dual-purpose data module that is written to be reused by both training and publications. A business rule could require the use of a certain language style (e.g. active as opposed to passive voice) for the dual-purpose data module.

Another principle that can help when the content cannot be context-agnostic, is the parameterization of reusable content. With parameterization, you include variable references in the reusable content that are resolved at run time. The Exist XML database has an elegant way of handling this using a combination of XInclude and XQuery as in the following example:

<xi:include href="warning.xq?var1=material&var2=process"/>

Here warning.xq is a stored XQuery witch is compiled and executed by Exist to return the root element of the warning. The content of the warning depends on the material and process used to carry out the maintenance procedure. var1 and var2 are passed as global external variables to the XQuery.

The issue of content granularity is directly related to the principle of context-agnostic content. Although the data module is the basic unit of information in S1000D, content can be managed at a lower level of granularity. An interested feature of some XML editors is the ability to select an element inside an XML document and convert that element into an XIncluded file. So while a technical author is writing a warning inside a data module, she can pull out that warning as an XIncluded XML file if she determines that the warning could be reusable in other publications.

Another area where XQuery facilitates reuse is the dynamic assembly of content based on product attributes such as applicability, security, and skill level. S1000D has a comprehensive metadata facility called IDSTATUS that can be leveraged to filter content. A good example is applicability filtering. In the case of an aircraft, the applicability of an S1000D maintenance or operation procedure can depend on the following attributes and conditions (among others):

  1. Manufacturer serial number
  2. Aircraft registration number
  3. Service bulletin incorporation
  4. Location of maintenance
  5. Aviation regulations
  6. Temperature, wind speed, and sandy conditions.

XInclude and XQuery can be used together to package content into S1000D publication modules by executing queries that filter content based on metadata in the IDSTATUS.

An important condition for content reuse is the principle of discoverability of reusable content. Obviously, you cannot reuse a piece of content if you don't know that it exists and where to find it. A technical author should be able to query or browse the S1000D CSDB (Common Source Data Base) to find relevant reusable content. To facilitate enterprise-wide content reuse, I highly recommend a CSDB based on a native XQuery-compliant XML database and deployed as a web application. That will allow authors to perform both full-text and structured queries on the CSDB. The query should return a list of data modules or reusable chunks. The author should then be able to select the reusable chunk to automatically insert an XInclude targeting that chunk.

In support of the principle of reusable content discoverability, appropriate metadata should be added to the content. The DMs already have comprehensive metadata in the IDSTATUS section. Reusable content at a lower level of granularity (like a warning) should also have appropriate metadata specified.

An XQuery-enabled native XML database can help with the governance of your reuse initiative by providing powerful reporting capabilities. For example, you can easily run an XQuery to find all documents that contain an XInclude to a particular chunk. This is important for understanding the impact of updates to that chunk. Another potential issue that could require some attention is the versionning of reusable content. Some form of notification mechanism can be helpful to alert consumers to changes to reusable content. This can take the form of an Atom feed to which consumers can subscribe.

It is important to select an XML authoring tool that has good support for XInclude. Fortunately, some commercial XML editors now have decent support for XInclude. However, these XML editors remain complex specialist tools that are often used only by professional technical authors in documentation departments. At one of our aerospace customers, manufacturing assembly and functional test procedures were used to create installation and testing procedures for service publications. To allow their engineers to contribute S1000D content, we designed a light XML authoring application based on an XForms front-end with XML data persisted in a native XML database using a RESTful API.

Any data reuse strategy should look beyond training and publication to identify opportunities to reuse data and streamline processes across the entire aircraft lifecycle.

Thursday, May 15, 2008

Spring, SCA, and OSGi

French paleontologist Pierre Teilhard de Chardin once said: “Tout Ce Qui Monte Converge” or “Everything That Rises Must Converge”. I learned this quotation from my father who mentioned to me that it was the topic of his philosophy dissertation at his university entrance exam.

The quotation accurately describes what I see happening in the world of software development with the rise of Spring, OSGi, and SCA.

During the last 30 years, the software industry has evolved from structured design to object-oriented design, POJO programming, and lately service-oriented design. With the rising complexity and costs of software systems, the ultimate goal of this evolution has been the reuse of software assets through loose coupling and service orientation.

Spring is based on the principle of inversion of control (IoC) or dependency injection. With Spring, objects (simple POJOs) are provided with their dependencies as opposed to the objects managing or looking up those dependencies themselves. Spring relies on aspect-oriented programming (AOP) to declaratively manage cross-cutting concerns such as security, transaction, and logging. From a quality and agile development perspective, one big advantage of Spring-based applications is that they are amenable to unit testing using frameworks such as JUnit or EasyMock.

Service-oriented architecture (SOA) exposes application business logic as a set of services that are remotely accessible and reusable across platforms and programming languages. The Service Component Architecture (SCA) has been designed to facilitate service composition. The SCA Assembly Model consists of one or more service components. Service components provide business functions to other components within or outside the module. A composite contains one or more service components and specifies communication bindings and policies such as security and transactions. Like in Spring, the artifacts and the dependencies between them are described using XML.

The Open Services Gateway initiative (OSGi) defines a dynamic service model where components (packaged as bundles) and their dependencies are specified in a service registry. OSGi standardizes the life cycle management of these bundles including deployment, installation/uninstallation, and updates with full versioning capabilities. Bundles can be dynamically started, stopped, or updated without the need for a reboot. OSGi defines a model for publishing, discovering, and binding to services within the same virtual machine (VM).

Spring, SCA, and OSGi are converging to create an environment that facilitates the design and the lifecycle management of software assets that are exposed as reusable services. Software vendors are actively exploring different opportunities to combine these three technologies. The combination of Spring, SCA, and OSGi is already having a transformative impact not only on service-oriented design and application development in general, but also on the application servers and middleware market as well.

Sunday, May 4, 2008

SOA and ROA Design Principles and Patterns

I've been compiling a list of design patterns and anti-patterns on Service Oriented Architecture (SOA) and Resource Oriented Architecture (ROA). I find the following resources quite useful.

If you're looking for design patterns in building RESTful applications, the best way to start is to look at the Atom Publishing Protocol (AtomPub) which is a good embodiment of the principles of the REST architectural style. The Google Data API (GData) is a real world implementation of AtomPub. At the XML 2007 Conference, I've proposed a RESTful approach to aviation technical data management called "Integrated Documentation Environment for Aircraft Support (IDEAS)" (more on that on my previous blog RESTful IDEAS).

Another good resource is the book "RESTful Web Services" by Leonard Richardson and Sam Ruby. Chapter 8 entitled "REST and ROA Best Practices" is a must read and also addresses potential REST implementation issues such as asynchronous operations and transactions. Chapter 10 entitled "The Resource-Oriented Architecture Versus Big Web Services" offers ROA alternatives to WS-* specifications.

For SOA design patterns and anti-patterns, here are some useful resources:

I don't believe that ROA is the answer to all SOA project failures out there. However, I do believe that certain requirements and use cases are more amenable to the REST architectural style (more on that in a future post).