Saturday, January 22, 2011

XML Processing in Healthcare Applications

Meaningful Use certification requires the ability to create patient summaries in either C32 or CCR format. One of the most frequently asked questions on the HL7 Structured Document mailing list is related to the processing of the CDA XML schema with data binding tools such as JAXB or Castor. Initially, people are not able to generate Java classes with JAXB at all. After some changes to the schema, JAXB finally works and creates hundreds of classes which are not very easy to work with and maintain. Then someone suggests using the Model-Driven Health Tools (MDHT) CDA tools which are Java-based. You face additional headache if you're not developing on the Java platform.

In a paper presented at the Balisage 2009 conference, a team of engineers who implemented the "Laika" C32 compliance testing tool described the issues with the CDA and C32 XML structure:

  • Repeated use of overly abstract data structures: The HL7 CDA defines a number of very generic objects that are used to represent information in a given document. Differing information, such as medications and conditions, are represented using the same XML elements with very subtle changes in their nesting and attributes. This makes a CDA document difficult to process.

  • Underspecified implementation, including lack of a normative schema: While there is an XML schema for the HL7 CDA, a final schema does not exist for the HITSP C32 or other CDA-based documents due to their use of attributes for selecting templates. Thus, defining schemas for these documents is impossible. As a result, CDA-based constructs such as HITSP C32 cannot be automatically validated by XML parsers; standard object mapping tools, such as XML Beans or JAXB, cannot be used.

  • Ambiguous data types: Data can be represented in multiple ways in a CDA document. Consumers of CDA documents must, therefore, write software that handles any of the numerous permutations of these data types. This leads to bloated software, or more likely, software that does not implement the full specification and experiences interoperability problems when it receives data in an unexpected format.

  • Steep and long learning curve: Mastery of the CDA and its many specifications and constructs takes an experienced software engineer many months to achieve. Once learned, it is very cumbersome to employ in robust software applications and services. These difficulties drive up the cost and time to develop and maintain health care software, thus reducing the pace of innovation.

In a previous post entitled "The Future of Healthcare Data Exchange Standards", I suggested some ideas on how to develop standard XML schemas that support the software development process as opposed to hindering it. Since we're not there yet, in this post I will suggest some ideas on dealing with the complexity of the CDA schema and C32 generation process.

The key is to leverage the power of XML related technologies such as XPath2, XSLT2, XQuery, XProc, ISO Schematron, and even XML Schema 1.1 (for assertions or business rules constraints) to simplify the task. First, generate a simple and perhaps flat XML representation (let's call it simpleC32) of the patient summary from your domain objects or database (through a data transfer object or DTO for example). That simpleC32 contains all the content that is needed to populate the C32 templates and generate a valid C32 document. You can create your own XML schema for your simpleC32 and use it for validation and data binding.

Once you have a valid simpleC32 document, you can use XSLT2 to transform the patient summary from your simpleC32 representation into a C32 document that can be validated against the NIST Meaningful Use C32 Validator. This is roughly the idea behind the GreenCDA project. Use that as an inspiration on how to create a simple representation of the C32. You can even use the GreenCDA XML schema as your simpleC32. But don't hesitate to create your own simpleC32 if the GreenCDA does not work for you, because the target is still the C32, and the idea here is to have an intermediary representation (an Adapter) to make your life easier. It is also an approach that allows you to isolate your domain model and prevent the complexity of the C32 data model from leaking into your domain layer (see my previous post on the concept of Anti-Corruption Layer in Domain Driven Design).

Why is this approach not used more often? Some developers who code with imperative programming languages (such as Java, C#, or JavaScript) are not comfortable with declarative programming using languages like XSLT2 and XQuery. I've recently seen a Java developer use JAXB to create hundreds of classes and thousands of hard to maintain lines of code for a simple transformation from the CDA to a different target XML schema.

The basic difference between declarative (and functional) programming languages and imperative languages is that the former specify the "what" (the intent) as opposed to the "how" (the algorithm). However, declarative programming with XSLT2 and XQuery can be mastered through training and practice: see my previous posts entitled: "In Defense of XSLT", "Why XProc Rocks", and Putting XQuery to Work in Healthcare".

While Java and C# are general purpose languages, processing languages like XSLT2, XQuery, and XProc are actually based on the XQuery 1.0 and XPath 2.0 Data Model (XDM) and specifically designed for the purpose of manipulating XML documents. This is particularly helpful when dealing with a complex and deep structure such as the HL7 CDA and other HL7 V3 messages. These XML-centric processing languages use XPath2 to navigate the XML tree. In general, consider using them in the following cases:

  • Applications that require dealing with a complex industry data exchange XML schema which is not easy to process with your databinding and other development tools. In that case, create an intermediary simpe XML representation and map it to the industry data exchange XML schema using XSLT2 or XQuery (XQuery is not just for querying native XML databases, it is also a powerful language for processing XML documents).

  • Applications that require translation from an XML schema to another target XML schema (for example a mapping from the HL7 CCD to the ASTM CCR or from the C32 to XHTML).

  • Applications that require translation from an XML representation to a non-XML representation and round-trip (for example HL7 v2.x to HL7 V3, C32 XML to JSON, or C32 to a non-XML serialization of RDF).

  • Consider using XProc if you need to chain multiple XML processing steps such as: query a data source with XQuery, expand XIncludes, validate against XML schema, validate against a schematron schema, transform with XSLT2, generate a PDF document with XSL FO, and so on.

The Universal Exchange Language proposed by the PCAST Report could be an opportunity to address the issues listed above.