Saturday, May 5, 2012

How to Add Arbitrary Metadata to Any Element of an HL7 CDA Document

There has been a lot of buzz lately about metadata tagging in the health IT community. In this blog, I describe an approach to annotating HL7 CDA documents (or any other XML documents) without actually editing the document that is being annotated. Metadata tagging is just an example of annotation. The underlying principle of this approach is that Anyone can say Anything about Anything (the AAA slogan) which is well know in the Semantic Web community. In other words, anyone (e.g., patient, care giver, physician, provider organization) should have the ability to add arbitrary metadata to any element of a CDA document. For the sake of "Separation of Concerns" which is a fundamental principle in software engineering, the metadata should be kept out of the CDA document. The benefits of keeping the metadata or annotations out of the CDA document include:
  • Reuse of the same metadata by distinct elements from potentially multiple clinical documents.
  • The ability to update the metadata without affecting the target CDA documents.
  • The ability for any individual, organization, or community of interest (e.g., privacy or medical device manufacturers) to create a metadata vocabulary without going through the process of modifying the normative CDA specification (or one of its offsprings like the CCD, the C32, or the Consolidated CDA) or the XDS metadata specifications.

History and Current Status of Metadata Standards in Health IT

The CDA specification defines some metadata in the header of a CDA document. In addition, the XD* family of specifications (XDS, XDR, and XDM) also defines a comprehensive set of metadata to be used in cross enterprise document exchange. NIEM is being used currently in several health IT projects. In a previous post titled "Toward a Universal Exchange Language for Healthcare", I described how the NIEM metadata approach could be adapted to the healthcare domain.

The President's Council of Advisors on Science and Technology (PCAST) published a report in December 2010 entitled: "Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward". To describe the proposed approach to metadata tagging, the report provides an example based on the exchange of mammograms:
"The physician would be able to securely search for, retrieve, and display these privacy protected data elements in much the way that web surfers retrieve results from a search engine when they type in a simple query.
What enables this result is the metadata attached to each of these data elements (mammograms), which would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information-who may access the mammograms, either identified or de­identified, and for what purposes, (iii) the provenance of the data-the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."
The HIT Standards Committee (HITSC) Metadata Tiger Team made specific recommendations to the ONC in June 2011. These recommendations included the use of:

  • Policy Pointers: URLs that point to external policy documents affecting the tagged data element.
  • Content Metadata: the actual metadata with datatype (clinical category) and sensitivity (e.g., substance abuse and mental health).
  • Use of the HL7 CDA R2 with headers.

Based on those recommendations, the ONC published a Notice of Proposed Rule Making (NPRM) in August 2011 to receive comments on proposed metadata standards.

The Data Segmentation Working Group of the ONC Standards and Interoperability Framework is currently working on metadata tagging for compliance with privacy policies and consent directives.

The Annotea Protocol

The capability to add arbitrary metadata to documents without modifying them has been available in the Semantic Web for at least a decade. Indeed, it is hard to talk about metadata without a reference to the Semantic Web. I will use the W3C Annotea Protocol (which is implemented by the Amaya open source project) to demonstrate this capability. I will also show that this approach does not require the use of the Resource Description Framework (RDF) format and related Semantic Web technologies like OWL and SPARQL. The approach can be adapted to alternative representation formats such as XML, JSON, or the Atom syndication format. Let's assume that I need to add metadata tags to the CDA document below. The CDA document has only one problem entry for substance abuse disorder (SNOMED CT code 66214007) and my goal is to attach privacy metatada prohibiting the disclosure of that information (the most relevant elements are highlighted in red):

<templateId root="2.16.840.1.113883."
<templateId root=""
    assigningAuthorityName="IHE PCC"/>
<templateId root="2.16.840.1.113883." assigningAuthorityName="HL7 CCD"/>
<!--Problems section template-->
<code code="11450-4" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC"
    displayName="Problem list"/>
<entry typeCode="DRIV">
<act classCode="ACT" moodCode="EVN">
    <templateId root="2.16.840.1.113883."
        assigningAuthorityName="HITSP C83"/>
    <templateId root="2.16.840.1.113883."
    <templateId root=""
        assigningAuthorityName="IHE PCC"/>
    <templateId root=""
        assigningAuthorityName="IHE PCC"/>
    <!-- Problem act template -->
    <id root="6a2fa88d-4174-4909-aece-db44b60a3abb"/>
    <code nullFlavor="NA"/>
    <statusCode code="completed"/>
        <low value="1950"/>
        <high nullFlavor="UNK"/>
    <performer typeCode="PRF">
            <id extension="PseudoMD-2" root="2.16.840.1.113883."/>
    <entryRelationship typeCode="SUBJ" inversionInd="false">
        <observation classCode="OBS" moodCode="EVN">
            <templateId root="2.16.840.1.113883."
            <templateId root=""
                assigningAuthorityName="IHE PCC"/>
            <!--Problem observation template - NOT episode template-->
            <id root="d11275e7-67ae-11db-bd13-0800200c9a66"/>
            <code code="64572001" displayName="Condition"
                <reference value="#PROBSUMMARY_1"/>
            <statusCode code="completed"/>
                <low value="1950"/>
            <value  displayName="Substance Abuse Disorder" code="66214007" codeSystemName="SNOMED" codeSystem="2.16.840.1.113883.6.96"/>
            <entryRelationship typeCode="REFR">
                <observation classCode="OBS" moodCode="EVN">
                    <templateId root="2.16.840.1.113883."/>
                    <!-- Problem status observation template -->
                    <code code="33999-4" codeSystem="2.16.840.1.113883.6.1"
                    <statusCode code="completed"/>
                    <value  code="55561003"
                        <reference value="#PROBSTATUS_1"/>

The following is a separate annotation document containing some metadata pointing to the Substance Abuse Disorder entry in the target CDA document:

<r:RDF xmlns:r=""
        <r:type r:resource=""/>
        <r:type r:resource=""/>
        <a:annotates r:resource=""/>
        <d:title>Sample Metadata Tagging</d:title>
        <d:creator>Bob Smith</d:creator>
        <a:body>Do Not Disclose</a:body>

Please note a few interesting facts about the annotation document:

  • As explained by the original specification: "The Annotea protocol works without modifying the original document; that is, there is no requirement that the user have write access to the Web page being annotated."
  • The annotation itself has metadata using the well known Dublin Core metadata specification to specify who created this annotation and when.
  • The document being annotated is cda.xml located at This is described by the element <a:annotates r:resource=""/>.
  • The specific element that is being annotated within the target CDA document is specified by the context element: <a:context>[1]/section[1]/entry[1])</a:context> using XPointer, a specification described by the W3C as "the language to be used as the basis for a fragment identifier for any URI reference that locates a resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity."
  • The XPath expression /ClinicalDocument/component/structuredBody/component[1]/section[1]/entry[1] within the XPointer is used to target the entry element in the CDA document.
  • Using XPath (1.0 or 2.0) allows us to address any element (or node) in an XML document. For example, this XPath //value[@code='66214007']/ancestor::entry will point to any entry element which contains a value element with an attribute code='66214007' (essentially targeting all entry elements which contain a Substance Abuse Observation). The combination of XPath, XPointer, and standard medical terminology codes gives the ability to attach any annotation or metadata to any element having interoperable semantics.
  • The body element contains the actual annotation: <a:body>Do Not Disclose</a:body>. However, the body of the annotation can also be located outside of the annotation (e.g., in a shared metadata registry) in which case the body element will be marked up as in the following example: <a:body r:resource=""/>

Alternative Representations


As mentioned before, for those who for one reason or another don't want to use RDF and related Semantic Web technologies, the annotation can be easily converted to a pure XML (as opposed to RDF/XML), JSON, or Atom representation. The original Annotea Protocol describes a RESTful protocol which includes the following operations: posting, querying, downloading, updating, and deleting annotations. The Atom Publishing Protocol (APP) is a newer RESTful protocol that is well adapted to the Atom syndication format.

Processing Annotations with XPointer

How the annotations are processed and consumed is only limited by the requirements of a specific application and the imagination of the developers writing it. For example, an application can read both the annotation document and the target CDA document and overlay the annotations on top of the entries in the CDA document while displaying the latter in a web browser. Another example is the enforcement of privacy policies and preferences prior to exchanging the CDA document. The issue that will be raised is how to process the XPointer fragment identifiers. XPointer uses XPath which is a well established XML addressing mechanism supported by many XML processing APIs across programming languages. For those of you who use XSLT2 to process CDA documents, there is the open source XPointer Framework for XSLT2 for use with the Saxon XSLT2 engine.