Sunday, February 27, 2011

The Greening of the HL7 CDA

I attended the HIMSS 2011 Conference this week in Orlando, FL. The GreenCDA was one of the big themes at the HL7 booth. The goal of the HL7 GreenCDA project is to provide a simple intermediary XML representation of the CDA to facilitate quick learning and ease of use for developers building healthcare data exchange solutions. Using the GreenCDA should not require prior knowledge of the HL7 Reference Information Model (RIM) and the associated model refinement process.

Developers should be able to generate code from the GreenCDA XML schema using data binding tools in any programming language of their choice. It should also be possible to create a round-trip transformation between the GreenCDA and the CDA. These requirements also apply to CDA implementations such as the HITSP C32. The GreenCDA will be available as an HL7 Implementation Guide and the HL7 Structured Document Working Group recently issued a GreenCDA wire format position statement.

In a previous post entitled "XML Processing in Healthcare Applications", I described some of the issues with the HL7 CDA and HITSP C32 XML structure and suggested some ideas on dealing with the complexity of the CDA schema and C32 generation process. In this post, I will share some thoughts on what can be done to ensure that the GreenCDA lives up to its full potential as the answer to the simplification challenge in healthcare data exchange standards.

XML Schemas In the Software Development Lifecycle

The XML schema is an important part of the service contract in Service Oriented Architecture (SOA). Services contracts also include the WSDL and WS-Policy documents. Using the recommended contract-first approach to web services development, developers generate client as well as server code using various tools and APIs in their native programming language and framework. Even when not using a pre-existing industry XML schema, the contract-first approach allows developers to decouple the service contract from platform-specific idiosyncrasies and adhere to cross-platform interoperability standards such as the WSI-Basic Profile.

On the Java platform, JAX-WS and JAXB allow developers to generate code from the WSDL and XML schema with tools like WSDL2Java.

On the .NET platform, the Windows Communication Framework (WCF) and Visual Studio provide data binding tools out-of-the-box like the Svcutil. There is also an open source tool called WSCF.blue specifically designed to facilitate contract-first web services development on the .NET platform.

The GreenCDA XML schema could also be used in support of the "Canonical Data Model" enterprise integration pattern. Enterprise data architects typically extend industry XML schemas components to satisfy custom needs.

Finally, the PCAST Report released in December 2010 recommended a universal exchange language that is "structured as individual data elements, together with metadata that provide an annotation for each data element". The report suggests that the metadata attached to each of these data elements

"...would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information—who may access the mammograms, either identified or de-identified, and for what purposes, (iii) the provenance of the data—the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."

Put together, these requirements argue in favor of a GreenCDA XML schema that supports the following:

  • Reusability
  • Extensibility
  • A well-defined versioning strategy
  • Seamless code generation in a variety of programming languages and development frameworks
  • A metadata facility per the PCAST recommendations.


Designing for Reuse and Extensibility

I suggest that the GreenCDA should only use global and named simple and complex types to facilitate reuse and extensibility. In other words, anonymous type definitions should be avoided. Extensibility is typically implemented through the <xsd:extension> element. Reuse can also be achieved by assembling logically related schema components into separate schema documents and using the <xsd:include> and <xsd:import> constructs.

Common XML schema components (also called core components) such as Hl7 datatypes as well as person, address, and organization should be in a separate schema file ideally under a different namespace than the target namespace of the GreenCDA itself.


Component Naming and Documentation

It would be nice to have different naming conventions for types vs. elements and attributes. Also schema component names should be spelled out for readability. A component name like "ivlTs" is not obvious for someone who is not familiar with HL7 datatypes.

Each type, element, or attribute should have a required <xs:annotation> child element which describes the semantics of the element in its child <xs:documentation> element. In other words, all schema components should be documented.


Support for Data Binding Tools

Certain features of the XML Schema language such as mixed content models, <xsd:choice>, and dynamic type substitution with xsi:type are not well supported by various XML databinding tools. The need to use these constructs to accurately express the GreenCDA XML data structure should be balanced against the ability to seamlessly generate code from the GreenCDA XML schema using various XML databinding tools.

Before the GreenCDA is released for production use, I suggest at least two open source reference implementations in two different development platforms (such as Java and .NET) covering the end-to-end web services development cycle using the specific tooling provided by the respective platforms.


What Can Be Learned From the National Information Exchange Model (NIEM)

The ONC Standards and Interoperability Framework is leveraging the NIEM from a process perspective. However, I believe there is much to be learned from the design of the NIEM as an XML data exchange standard. This does not imply that the GreenCDA should use the NIEM Core. It simply means that the healthcare domain can leverage certain NIEM design principles that are not only backed by advanced research (at Georgia Tech Research Institute) in XML schema modeling, but are also proven by the numerous government agencies using the NIEM.

The NIEM embodies recognized XML Schema design patterns in its Naming and Design Rules (NDR). The NIEM provides a schematron-based tool to automatically validate XML schemas against the rules defined in the NDR. For example, the schematron schema can enforce component naming conventions or the requirement to document every schema component.

The PCAST Report says:
"We think that a universal exchange language must facilitate the exchange of metadata tagged elements at a more atomic and disaggregated level, so that their varied assembly into documents or reports can itself be a robust, entrepreneurial marketplace of applications."

The NIEM defines an extensible metadata facility for adding metadata to any data elemeent in the spirit of the PCAST recommendations. The NIEM itself support the exchange of "data items" at any level of granularity. These XML Schema Design Patterns are universal and can be applied to any domain including the healthcare domain.

1 comment:

Diego B. said...

So is GreenCDA XML Schema already available or it will be hold until it is ready for deployment?
visiting http://wiki.hl7.org/index.php?title=GreenCDA_Project lets you download a GreenCCD (???) Schema. Is this supposed to be also the schema for GreenCDA?