Sunday, February 27, 2011

The Greening of the HL7 CDA

I attended the HIMSS 2011 Conference this week in Orlando, FL. The GreenCDA was one of the big themes at the HL7 booth. The goal of the HL7 GreenCDA project is to provide a simple intermediary XML representation of the CDA to facilitate quick learning and ease of use for developers building healthcare data exchange solutions. Using the GreenCDA should not require prior knowledge of the HL7 Reference Information Model (RIM) and the associated model refinement process.

Developers should be able to generate code from the GreenCDA XML schema using data binding tools in any programming language of their choice. It should also be possible to create a round-trip transformation between the GreenCDA and the CDA. These requirements also apply to CDA implementations such as the HITSP C32. The GreenCDA will be available as an HL7 Implementation Guide and the HL7 Structured Document Working Group recently issued a GreenCDA wire format position statement.

In a previous post entitled "XML Processing in Healthcare Applications", I described some of the issues with the HL7 CDA and HITSP C32 XML structure and suggested some ideas on dealing with the complexity of the CDA schema and C32 generation process. In this post, I will share some thoughts on what can be done to ensure that the GreenCDA lives up to its full potential as the answer to the simplification challenge in healthcare data exchange standards.

XML Schemas In the Software Development Lifecycle

The XML schema is an important part of the service contract in Service Oriented Architecture (SOA). Services contracts also include the WSDL and WS-Policy documents. Using the recommended contract-first approach to web services development, developers generate client as well as server code using various tools and APIs in their native programming language and framework. Even when not using a pre-existing industry XML schema, the contract-first approach allows developers to decouple the service contract from platform-specific idiosyncrasies and adhere to cross-platform interoperability standards such as the WSI-Basic Profile.

On the Java platform, JAX-WS and JAXB allow developers to generate code from the WSDL and XML schema with tools like WSDL2Java.

On the .NET platform, the Windows Communication Framework (WCF) and Visual Studio provide data binding tools out-of-the-box like the Svcutil. There is also an open source tool called WSCF.blue specifically designed to facilitate contract-first web services development on the .NET platform.

The GreenCDA XML schema could also be used in support of the "Canonical Data Model" enterprise integration pattern. Enterprise data architects typically extend industry XML schemas components to satisfy custom needs.

Finally, the PCAST Report released in December 2010 recommended a universal exchange language that is "structured as individual data elements, together with metadata that provide an annotation for each data element". The report suggests that the metadata attached to each of these data elements

"...would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information—who may access the mammograms, either identified or de-identified, and for what purposes, (iii) the provenance of the data—the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."

Put together, these requirements argue in favor of a GreenCDA XML schema that supports the following:

  • Reusability
  • Extensibility
  • A well-defined versioning strategy
  • Seamless code generation in a variety of programming languages and development frameworks
  • A metadata facility per the PCAST recommendations.


Designing for Reuse and Extensibility

I suggest that the GreenCDA should only use global and named simple and complex types to facilitate reuse and extensibility. In other words, anonymous type definitions should be avoided. Extensibility is typically implemented through the <xsd:extension> element. Reuse can also be achieved by assembling logically related schema components into separate schema documents and using the <xsd:include> and <xsd:import> constructs.

Common XML schema components (also called core components) such as Hl7 datatypes as well as person, address, and organization should be in a separate schema file ideally under a different namespace than the target namespace of the GreenCDA itself.


Component Naming and Documentation

It would be nice to have different naming conventions for types vs. elements and attributes. Also schema component names should be spelled out for readability. A component name like "ivlTs" is not obvious for someone who is not familiar with HL7 datatypes.

Each type, element, or attribute should have a required <xs:annotation> child element which describes the semantics of the element in its child <xs:documentation> element. In other words, all schema components should be documented.


Support for Data Binding Tools

Certain features of the XML Schema language such as mixed content models, <xsd:choice>, and dynamic type substitution with xsi:type are not well supported by various XML databinding tools. The need to use these constructs to accurately express the GreenCDA XML data structure should be balanced against the ability to seamlessly generate code from the GreenCDA XML schema using various XML databinding tools.

Before the GreenCDA is released for production use, I suggest at least two open source reference implementations in two different development platforms (such as Java and .NET) covering the end-to-end web services development cycle using the specific tooling provided by the respective platforms.


What Can Be Learned From the National Information Exchange Model (NIEM)

The ONC Standards and Interoperability Framework is leveraging the NIEM from a process perspective. However, I believe there is much to be learned from the design of the NIEM as an XML data exchange standard. This does not imply that the GreenCDA should use the NIEM Core. It simply means that the healthcare domain can leverage certain NIEM design principles that are not only backed by advanced research (at Georgia Tech Research Institute) in XML schema modeling, but are also proven by the numerous government agencies using the NIEM.

The NIEM embodies recognized XML Schema design patterns in its Naming and Design Rules (NDR). The NIEM provides a schematron-based tool to automatically validate XML schemas against the rules defined in the NDR. For example, the schematron schema can enforce component naming conventions or the requirement to document every schema component.

The PCAST Report says:
"We think that a universal exchange language must facilitate the exchange of metadata tagged elements at a more atomic and disaggregated level, so that their varied assembly into documents or reports can itself be a robust, entrepreneurial marketplace of applications."

The NIEM defines an extensible metadata facility for adding metadata to any data elemeent in the spirit of the PCAST recommendations. The NIEM itself support the exchange of "data items" at any level of granularity. These XML Schema Design Patterns are universal and can be applied to any domain including the healthcare domain.

Thursday, February 3, 2011

A Therapeutic Layered Cake

With all the talk about the PCAST Report, I've been doing some Systems thinking on semantic interoperability in healthcare IT. Trying to put all the pieces together, I remembered Tim Berners-Lee's "Semantic Web Layer Cake".




The Semantic Web layer Cake has gone through several iterations over the years (see James Hendler's presentation on that subject). However, I think it can still be very helpful in visualizing a unified framework for addressing the challenges of semantic interoperability in Healthcare IT.

As we move to Stage 2 of Meaningful Use, I believe Clinical Decision Support (CDS) will take center stage. Beyond currently used XML-based data structures (such as HL7 v3 messages), this will put an increased emphasis on medical terminologies, ontologies, and knowledge representation in OWL. For example, ICD-11 is being developed using OWL to allow consistency checking and linking to other biomedical terminologies and ontologies. Equally important to knowledge representation, but not shown in the layer cake above is the Simple Knowledge Organization System (SKOS) specification.

In a report entitled "Semantic Interoperability Deployment and Research Roadmap", Alan Rector summarized the difference between the notions of ontology, knowledge representation, and data model:

  • Ontology – A representation of what is universally true, including what is true by definition

  • Knowledge Representation or "Background knowledge resource" – a representation of what is generally true, or widely known to be true in some specific instance. In general, the knowledge representation is formulated in terms of and indexed by the Ontology.

  • Information model or Data model a model of how information is structured in a given software system, message, or electronic health record. In general, the data structures carry codes for the ontology as their content.

Clinical guidelines are published in the form of narrative text, sometimes with an evaluation algorithm. The translation of those guidelines into an executable representation is a complex and costly process. Several formalisms and standards have been proposed such as the Arden Syntax, GLIF, GELLO, and GEM. However, none of these standards has been widely adopted. Developed with inputs from the Business Rules, Logic Programming, and Semantic Web communities, the W3C Rule Interchange Format (RIF) can help with the interchange of executable Clinical Decision Support (CDS) rules in addition to adding reasoning capabilities to patient records. This example shows how decision support rules could be exchanged between two rules engines (Drools and Jess) using the RIF PRD syntax, a standard XML serialization format for production rule languages.

Existing patient records marked up in XML HITSP C32 or ASTM CCR can be lifted into RDF statements (with XSLT or XQuery for example) and queried using SPARQL.

Proof, Trust, and Cryptography are being currently addressed by various standards and specifications in the healthcare industry notably the OASIS Cross-Enterprise Security and Privacy Authorization (XSPA) Profiles of XACML, SAML, and WS-Trust.

On the User Interface side, I see HTML5 giving both Flex and Silverlight a run for their money in the next few years. This will be driven in part by the demand for mobile health (mHealth).