Sunday, December 19, 2010

How Not to Build A Big Ball of Mud

If you are a software developer or architect, in addition to the ever changing business requirements, you also need to deal with the myriad of application development frameworks and design patterns out there. There are frameworks for: the User Interface (UI), Dependency Injection (DI), Aspect Oriented Programming (AOP), and Object-Relational Mapping (ORM). On top of that, you will probably need a web services framework and perhaps an Enterprise Service Bus (ESB) if you need to integrate applications. As an architect, you also need to keep an eye on scalability, availability, security, usability, industry standards, and government regulations.

In such an environment, the lack of a disciplined approach to software architecture can quickly lead to a Big Ball of Mud. In a paper presented in 1997 at the Fourth Conference on Patterns Languages of Programs, Brian Foote and Joseh Yoder describe the Big Ball of Mud:

A BIG BALL OF MUD is haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle. We’ve all seen them. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.

The Big Ball of Mud remains the most pervasive architecture today. Note that these problems can be exacerbated by an agile software development approach that leaves little or no room for design and strategic thinking (see my previous post on software architecture documentation in agile projects).

Domain Driven Design (DDD) is a set of patterns that have been introduced by Eric Evans in his book entitled: "Domain-Driven Design: Tackling Complexity in the Heart of Software". I won't go into the details of what those patterns are. I do recommend that you read the book and there are other free DDD resources on the web as well. However, I will share with you some key DDD principles that have been helpful to me in wrapping my head around software architecture complexity:

  • Collaboration between software developers and domain experts is important to create a common understanding of the concepts of the domain. Note that we're not talking about UI components such as screens or fields here, nor are we talking about computer science abstractions such as classes and objects. We are talking about what the domain is made of conceptually. These domain concepts are expressed in a Ubiquitous Language that is used not only in conversations and software documentation, but also in the code. So in essence, DDD is model-driven design: the model can be translated into code and frameworks like Naked Objects can help you do exactly that. Continuously refine the model.

  • Experimentation and rapid prototyping are an efficient way to collaborate with domain experts and business analysts. This is where the Naked Objects Framework can be very helpful. Note that you can use Naked Objects for the initial prototyping and domain modeling, and then use your preferred frameworks for the remaining layers of your application.

  • Pay attention to the correct implementation of DDD building blocks such as: entities (behaviour-rich with business rules), value objects, aggregates, aggregate roots, domain events, factories, repositories, and services. Avoid an anemic domain model and know how to recognize value objects and persist them properly. Value objects (such as money and time interval) are immutable and are manipulated through side-effect free functions. Move complexity and behaviour out of your entities into those value objects.

  • Domain concepts are grouped into modules to delineate what Eric Evans calls the "conceptual contours" of the domain. To reduce coupling, OO patterns such as the dependency inversion principle, the interface segregation principle, and the acyclic dependency principles are applied.

  • DDD recommends the following four layers: the presentation layer, the application layer, the domain layer, and the infrastructure layer. Although the idea of a multi-tier architecture is not new, the anti-patterns are typically: a fat application layer, an anemic domain model, and a tangled mess in general. So, properly layer your architecture:

    • Repository interfaces are in the domain layer, but their implementation are in the infrastructure layer to allow "Persistence Ignorance"
    • Both the interface and implementation of factories are in the domain layer
    • Domain and infrastructure services are injected into entities using dependency injection (some argue that DDD is not possible without DI, AOP, and ORM)
    • The application layer takes care of cross-cutting concerns such as transactions and security. It can also mediate between the presentation layer and domain layer through Data Transfer Objects (DTOs).

  • DDD enables Object Oriented User Interfaces (OOUI) which expose the richness of the domain layer as opposed to obscuring it.

  • Models exist within bounded contexts and the latter should be clearly identified. In his book, Eric Evans talks about "strategic design" and "context maps" and suggests the following options for integrating applications:

    • Published language
    • Open host service
    • Shared kernel
    • Customer/supplier
    • Conformist
    • Anti-corruption layer
    • Separate ways.

    In industries such as healthcare where an XML-based data exchange standard exists, the "Published Language" approach is the pattern typically used. Each healthcare application participating in an exchange represents a separate context. On the other hand, an "Anti-Corruption Layer" can be created as an adapter to isolate the model against an industry standard model that is not considered best practice in data modeling, is inconsistent, immature, or subject to change. However, since there is tremendous value in exchanging data, we hope not to go "Separate Ways".

  • DDD is a solid foundation for next-generation architecture based on the Command Query Responsibility Segregation (CQRS) pattern. The UI sends commands which are handled by command handlers. These command handlers change the state of aggregate roots. However, the aggregate roots still define behavior and business rules and are responsible for maintaining invariants. Changes to aggregate roots generate events that are stored in an event store (this is called event sourcing). Aggregate roots are persisted by storing these streams of events in the event store. That way, the aggregate roots can be reconstructed by replaying the events from the event store. The events are published to subscribers including denormalizers for enhanced query performance. The separation of the read side from the write side allows:

    • Increased performance and scalability on the read and reporting side particularly when combined with a cloud deployment model
    • Complete audit trails through the event store
    • New data mining capabilities leveraging temporal queries
    • The opportunity to eliminate the need for Object Relational Mapping (ORM) through the use of high performance NoSQL databases.

Monday, December 13, 2010

Toward a Universal Exchange Language for Healthcare

The US President's Council of Advisors on Science and Technology (PCAST) published a report last week entitled: "Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward". The report calls for a universal exchange language for healthcare (abbreviated as UELH in this post). Specifically, the report says:

"We believe that the natural syntax for such a universal exchange language will be some kind of exten­sible markup language (an XML variant, for example) capable of exchanging data from an unspecified number of (not necessarily harmonized) semantic realms. Such languages are structured as individual data elements, together with metadata that provide an annotation for each data element."

First, let me say that I fully support the idea of a UELH. I've written in the past about the future of healthcare data exchange standards. The ASTM CCR and the HL7 CCD have been adopted for Meaningful Use Stage 1 and that was the right choice. In my opinion, the UELH proposed by PCAST is about the next generation healthcare data exchange standard that is yet to be built. It's part of the natural evolution and innovation that are inherent to the information technology industry. It is also a very challenging task that should be informed by the important work that has been done previously in this field including:

  • The ASTM CCR
  • The HL7 RIM, CDA, CCD, and greenCDA
  • Archetype-based EN 13606 from OpenEHR
  • The National Information Exchange Model (NIEM)
  • HITSP C32
  • Biomedical Ontologies using semantic web technologies such as OWL2, SKOS, and RDF.
  • Medical Terminologies such as SNOMED and RxNorm.

This new language should focus on identifying, addressing, and solving the issues with the use of the current set of healthcare data exchange standards. This will require a public discourse that is cordial and focused on solutions and innovative ideas. Most importantly, it will require listening to the concerns of implementers. This proposal should not be about reinventing the wheel. It should be about creating a better future by learning lessons from the past while being open-minded about new ideas and approaches to solving problems.

Note that the report talks about the syntax of this new language as some kind of an "XML variant". It also mentioned that the language must be exten­sible. This is important in order to enable innovation in this field. For example, we've recently seen a serious challenge to XML coming from JSON in the web APIs space (Twitter and Foursquare removed support for XML in their APIs and now only provide a JSON API). Similarly, in the Semantic Web space, alternatives to the RDF/XML serialization syntax have emerged such as the N-triples notation. This is not to say that XML is the wrong representation for healthcare data. It simply means that we should be open to innovation in this area.

Metadata and the Semantic Web in Healthcare

Closely related to the notion of metadata is the idea of the Semantic Web. Although semantic web technologies are not widely used in healthcare today, they could help address some of the issues with current healthcare standard information models including: model consistency, reasoning, and knowledge integration across domains (e.g. the genomics and clinical domains). In a report entitled "Semantic Interoperability Deployment and Research Roadmap", Alan Rector, an authority in the field of biomedical ontologies, explains the difference between ontologies and data structures:

A second closely related notion is that of an "information model" of "model of data structures". Both Archetypes and HL7 V3 Messages are examples of data structures. Formalisms for data structures bear many resemblances to formalisms for ontologies. The confusion is made worse because the description logics are often used for both. However, there is a clear difference.

  • Ontologies are about the things being represented – patients, their diseases. They are about what is always true, whether or not it is known to the clinician. For example, all patients have a body temperature (possibly ambient if they are dead); however, the body temperature may not be known or recorded. It makes no sense to talk about a patient with a "missing" body temperature.
  • Data structures are about the artefacts in which information is recorded. Not every data structure about a patient need include a field for body temperature, and even if it does, that field may be missing for any given patient. It makes perfect sense to speak about a patient record with missing data for body temperature.

A key point is that "epistemological issues" – issues of what a given physician or the healthcare system knows – should be represented in the data structures rather than the ontology. This causes serious problems for terminologies coding systems, which often include notions such as "unspecified" or even "missing". This practice is now widely deprecated but remains common.

One of the Common Terminology Services (CTS 2) submissions to the OMG is based on Semantic Web technologies such as OWL2, SKOS, and SPARQL. The UELH proposed by the PCAST should leverage the work that has been done by the biomedical ontology community.

The NIEM Approach to Metadata-Tagged Data Elements

The report goes on to say that the metadata attached to each of these data elements

"...would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information—who may access the mammograms, either identified or de-identified, and for what purposes, (iii) the provenance of the data—the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."

The report does not explain exactly how this should be done. So let's combine the wisdom of the NIEM, HL7 greenCDA, and OASIS XSPA (Cross-Enterprise Security and Privacy Authorization Profile of XACML for healthcare) to propose a solution. Let's assume that we need to add metadata about the equiment used for the lab result as well as patient consent directives to the following lab result entry which is marked up in greenCDA format:

<resultID root="107c2dc0-67a5-11db-bd13-0800200c9a66" />
<resultDateTime value="200003231430" />
<resultType codeSystem="2.16.840.1.113883.6.1" code="30313-1"
displayName="HGB" />
<resultStatus code="completed" />
<physicalQuantity value="13.2" unit="g/dl" />
<resultInterpretation codeSystem="2.16.840.1.113883.5.83"
code="N" />
<resultReferenceRange>M 13-18 g/dl; F 12-16

In the following, an s:metadata attribute is added to the root element (s:metadata is of type IDREFS and for brevity, I am not showing the namespace declarations):

<result s:metadata="equipment consent">
<resultID root="107c2dc0-67a5-11db-bd13-0800200c9a66" />
<resultDateTime value="200003231430" />
<resultType codeSystem="2.16.840.1.113883.6.1" code="30313-1"
displayName="HGB" />
<resultStatus code="completed" />
<physicalQuantity value="13.2" unit="g/dl" />
<resultInterpretation codeSystem="2.16.840.1.113883.5.83"
code="N" />
<resultReferenceRange>M 13-18 g/dl; F 12-16

The following is the lab test equipment metadata:

<LabTestEquipmentMetadata s:id="equipment">

And here is the patient consent directives marked in XACML XSPA format (this snippet is taken from the NHIN Access Consent Policies Specification):

<ConsentMetadata s:id="consent">
<Policy xmlns="urn:oasis:names:tc:xacml:2.0:policy:schema:os"
<Description>Sample XACML policy for NHIN</Description>
<!-- The Target element at the Policy level identifies the subject to whom the Policy applies -->
<ResourceMatch MatchId="">

<AttributeValue DataType="urn:hl7-org:v3#II"
<hl7:PatientId root="2.16.840.1.113883.3.18.103"
extension="00375" />
<ResourceAttributeDesignator AttributeId=""
DataType="urn:hl7-org:v3#II" />
<!-- This policy applies to all document query and document retrieve transactions -->
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType="">
<ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:action"
DataType="" />
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType="">
<ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:action"
DataType="" />
<Rule RuleId="133" Effect="Permit">
<Description>Permit access to all documents to all
physicians and nurses</Description>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for physicians -->
<AttributeValue DataType="">
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="" />
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for nurses -->
<AttributeValue DataType="">
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="" />
<!-- since there is no Resource element, this rule applies to all resources -->
<Rule RuleId="134" Effect="Permit">
<Description>Allow access Dentists and Dental Hygienists
Access from the Happy Tooth dental practice to documents
with "Normal" confidentiality for a defined time
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for dentists -->
<AttributeValue DataType="">
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="" />
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType=""></AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:subject:organization-id"
DataType="" />
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for dental hygienists -->
<AttributeValue DataType="">
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="" />
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType=""></AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:subject:organization-id"
DataType="" />
<ResourceMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<AttributeValue DataType="">
<ResourceAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:resource:patient:hl7:confidentiality-code"
DataType="" />
<EnvironmentMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:date-greather-than-or-equal">

<AttributeValue DataType="">
<EnvironmentAttributeDesignator AttributeId=""
DataType="" />
<EnvironmentMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:date-less-than-or-equal">

<AttributeValue DataType="">
<EnvironmentAttributeDesignator AttributeId=""
DataType="" />
<Rule RuleId="135" Effect="Deny">
<Description>deny all access to documents. Since this
rule is last, it will be selected if no other rule
applies, under the rule combining algorithm of first
<Target />

Please note the following:

  • Metadata "LabTestEquipmentMetadata" asserts the equipment used for the lab test.
  • Metadata "ConsentMetadata" asserts the patient consent directives leveraving the XSPA XACML format.
  • Metadata can be declared once and reused by multiple elements.
  • An element can refer to 0 or more metadata objects.

In NIEM, an appinfo:AppliesTo element in a metadata type declaration is used to indicate the type to which the metadata applies as in the following example (note this is not enforced by the XML schema validating parser, but can be enforced at the application level):

<xsd:complexType name="LabTestEquipmentMetadataType">
<i:AppliesTo i:name="LabResultType" />
<xsd:extension base="s:MetadataType">

<xsd:element name="LabTestEquipmentMetadata" type="LabTestEquipmentMetadataType" nillable="true"/>

NIEM defines a common metadata type that can be extended by any type definition that requires metadata:


<attribute name="id" type="ID"/>
<attribute name="linkMetadata" type="IDREFS"/>
<attribute name="metadata" type="IDREFS"/>
<attribute name="ref" type="IDREF"/>
<attribute name="sequenceID" type="integer"/>

<attributeGroup name="SimpleObjectAttributeGroup">
<attribute ref="s:id"/>
<attribute ref="s:metadata"/>
<attribute ref="s:linkMetadata"/>

<element name="Metadata" type="s:MetadataType" abstract="true"/>

<complexType name="ComplexObjectType" abstract="true">
<attribute ref="s:id"/>
<attribute ref="s:metadata"/>
<attribute ref="s:linkMetadata"/>

<complexType name="MetadataType" abstract="true">
<attribute ref="s:id"/>


Any type definition that needs metadata can simply extend ComplexObjectType as follows for lab result type:

<xsd:complexType name="LabResultType">
<xsd:extension base="s:ComplexObjectType">

Wednesday, October 13, 2010

Software Architecture Documentation in Agile Projects

One misconception that I often hear in Agile circles is that there is no need for software architecture documentation in Agile because "code is self-documenting". The emphasis in agile is not to eliminate the need for design and documentation, but to avoid Big Up Front Design (BDUF). Design and architecture documentation are still important in Agile. However, you only need just enough design and documentation to start coding. In other words, don't over-document.

As you code and refactor, some of the software architecture documentation will become quickly obsolete and should be discarded. Use tools such as Maven, SchemaSpy, Doxygen, and UmlGraph to auto-generate up-to-date documentation from your source code. A wiki is also a good tool for publishing and sharing architecture documentation. For consistency, I recommend using a template for documenting the architecture.

Provide the documentation only if it is really needed and used by stakeholders. So, don't try to document everything. You do need to document the following:

  • Design decisions and their rationale
  • Design patterns and development frameworks used
  • The architecture viewpoints and quality attributes that cannot be easily gleaned from the code alone.

Far too often, software architecture documentation only covers the code view. This is not enough. Stakeholders are not limited to developers, but also include end users, testers, the operational staff, compliance auditors, etc. When writing software architecture documentation, I first identify all stakeholders and their concerns. To ensure that I provide a 360-degree view of the architecture, I develop the architecture documentation based on the viewpoints and perspectives described by Nick Rozanski and Eoin Woods in their book "Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives" (Addison Wesley, April 2005)

The following are the Architecture Viewpoints:

  • Functional
  • Information
  • Concurrency
  • Development
  • Deployment
  • Operational

And here are the Architecture Perspectives:

  • Security
  • Performance and Scalability
  • Availability and Resilience
  • Evolution
  • Accessibility
  • Development Resource
  • Internationalization
  • Location
  • Regulation
  • Usability

These viewpoints and perspectives can be described using different notations such as UML (using stereotypes and profiles like SoaML for service oriented architecture), Business Process Modeling Notation (BPMN), and Domain Specific Languages (DSLs).

Wednesday, September 22, 2010

The Future of Healthcare Data Exchange Standards

Meaningful Use Final Rule has finally been released and I think now is a good time to start thinking about where we want to be five years from now in terms of healthcare data exchange standards.

Listening to the Concerns of Implementers

I think it is very important that we listen to the concerns of the implementers of the current set of standards. They are the users of those standards and good software engineers like to get feedback from their end users to fix bugs and improve their software. The following post details some of the concerns out there regarding the current HL7 v3 XML schema development methodology and I believe they should not be ignored: Why Make It Simple If It Can Be Complicated?

Using an Industry Standard XML Schema: A Developer's Perspective

XML documents are not just viewed by human eyeballs through the use of an XSLT stylesheet. The XML schema has become an important part of the service contract in Service Oriented Architecture (SOA). SOA has emerged during the last few years as a set of design principles for integrating applications within and across organizational boundaries.

In the healthcare sector for example, the Nationwide Health Information Network (NHIN) and many Health Information Exchanges (HIEs) are being built on a decentralized service-oriented architecture using web services standards such as SOAP, WSDL, WS-Addressing, MTOM, and WS-Policy. The Web Services Interoperability (WS-I) Profiles WS-I Basic and WS-I Security provide additional guidelines that should be followed to ensure cross-platform interoperability for example between .NET and Java EE platforms. Some of the constraints defined by the WS-I Basic Profile are related to the design of XML schemas used in web services.

An increasingly popular alternative to the WS-* stack is to use RESTful web services. The REST architectural style does not mandate the use of web services contract such as XML schema, WSDL, and WS-Policy. However, the web application description language (WADL) has been proposed to play the role of service contract for RESTful web services. This post will not engage in the SOAP vs. REST debate except to mention that both are used in implementation projects today.

On top of these platform-agnostic web services standards, each platform defines a set of specifications and tooling for building web services applications. In the Java world, these specifications include:

  • The Java API for XML Web Services (JAX-WS)
  • The Java Architecture for XML Binding (JAXB)
  • The Java API for RESTful Web Services (JAX-RS)
  • The Streaming API for XML (StAX).

JAX-WS and JAXB allow developers to generate a significant amount of Java code from the WSDL and XML schema with tools like WSDL2Java. The quality of a standard XML schema largely depends on how well it supports the web services development process and that's why I believe that creating a reference implementation should be a necessary step before the release of new standards. An industry standard XML schema that is hard to use will directly translate into high implementation cost resulting from development project delays for example.

Embracing Design Patterns

Beyond our personal preferences (such as the NIEM vs. HL7 debate), there are well established engineering practices and methodologies that we can agree on. In terms of software development, design patterns have emerged as a well known approach to building effective software solutions. For example, the following two books have had a strong influence in the fields of object-oriented design and enterprise application integration respectively (and they sit proudly on my bookshelf):

  • Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
  • Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions by Gregor Hohpe and Bobby Woolf.

An interesting design pattern from the "Enterprise Integration Patterns" book that is relevant to the current discussion on industry standard XML schemas is the "Canonical Data Model" design pattern. Enterprise data architects tasked with creating such canonical data models often reuse components from industry standard XML schemas. That approach makes sense but cannot succeed if the industry standard XML schema is not designed to support reusability, extensibility, and a clearly specified versioning strategy.

Modeling Data In Transit vs. Data at Rest

Modeling data at rest (e.g. data stored in relational databases) is a well established discipline. For example, data modeling patterns for relational data have been captured by Len Silverston and Paul Agnew in their book entitled "The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling".

There is a need to apply the same engineering rigor to modeling data in transit (e.g. data in web services messages). The XML Schema specification became a W3C Recommendation more than 9 years ago and I think there is now enough implementation experience to start building consensus around a set of XML Schema Design Patterns. The latter should address the following issues:

  1. Usability: the factors that affect the ability of an average developer to quickly learn and use an XML schema in a software development project
  2. Component Reusability
  3. Web services cross-platform interoperability constraints. Some of those constraints are defined by the WS-I Basic Profile
  4. Issues surrounding the use of XML databinding tools such as JAXB. This is particularly important since developers use those tools for code generation in creating web services applications. It is well known that existing databinding tools do not provide adequate support for all XML Schema language features
  5. Ability to manipulate instances with XML APIs such as StAX
  6. Schema extensibility, versioning, and maintainability.

These design patterns should be packaged into a Naming and Design Rules (NDR) document to ensure a consistent and proven approach to developing future XML vocabularies for the healthcare domain.

The XML Schema 1.1 specification is currently a W3C Candidate Recommendation. It defines new features such as conditional type assignments and assertions which allow schema developers to consolidate structural and business rules constraints into a single schema. This could help alleviate some of the pain associated with the multiple layers of Schematron constraints currently specified by HITSP C32, IHE PCC, and the HL7 CCD (sometimes referred to as the "HITSP Onion"). Saxon supports some of these new features.

Developing Standards the Way We Develop Software

The final point I'd like to make is that we should start creating healthcare standards the same way we develop software. I am a proponent of agile development methodologies such as Extreme Programming and Scrum. These methodologies are based on practices such as user stories, iteration (sprint) planning, unit test first, refactoring, continuous integration, and acceptance testing. Agile programming helps create better software and I believe it can help create better healthcare standards as well.

Monday, August 30, 2010

Wednesday, August 18, 2010

Health IT Standards in Canada

In Canada, health IT standards are established by Canada Health Infoway's Standard Collaborative. Canada Health Infoway is a not-for-profit organization funded by the federal government to develop pan-Canadian health IT standards and provide incentives for the adoption of health information technologies.

HIT Standards

The following are key standards approved by the Standard Collaborative:

  • Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for problem lists, procedures, and other clinical observations. 35,000 SNOMED CT concepts have been translated to Canadian French

  • HL7 Version 3 (HL7 V3) for clinical, ļ¬nancial, and administrative messaging and document exchange. Canada played a key role in the development of the HL7 v3 modeling methodology and tooling. The pan-Canadian HL7 v3 is used for the following core areas:

    • Laboratory
    • Immunization
    • Pharmacy (Drugs)
    • Client Registry (patient demographics)
    • Provider Registry
    • Shared Health Records
    • Electronic Claims
    • Public health surveillance

  • HL7 Clinical Document Architecture (CDA) standards enables pan-Canadian EHR interoperability

  • The pan-Canadian LOINC Observation Code Database (pCLOCD) for lab test results is used by the Lab Messaging and Nomenclature and Public Health Surveillance standards. PCLOCD adds and excludes certain records from the original LOINC standard to support Canadian requirements. Unified Code for Units of Measure (UCUM) is used for units of measures

  • Diagnostic Imaging (DI) Standards are based on DICOM and IHE XDS-I

  • The Health Canada Drug Product Database (HCDPD) provides coding for medications.


Infoway offers certification for the following classes of HIT software:

  • Client registry
  • Consumer health application
  • Consumer health platform
  • Diagnostic Imaging (DI)
  • Drug Information System (DIS)
  • Immunization registry
  • Provider registry

The assessment criteria cover functionality, privacy, security, interoperability and management and are based on the following standards:

  • Functionality – Canada Health Infoway Electronic Health Record Privacy and Security Requirements.

  • Privacy – Canada Health Infoway Electronic Health Record Infostructure (EHRi) Privacy & Security Conceptual Architecture; Government of Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA); The Canadian Standards Association’s Model Code for the Protection of Personal Information – CAN-CSA-Q830-03.

  • Security – Canada Health Infoway Electronic Health Record Infostructure (EHRi) Privacy & Security Conceptual Architecture; The International Organization for Standardization’s Code of Practice for Information Security Management – ISO/IEC 17799:2005; The National Institute of Standards and Technology’s Recommended Security Controls for Federal Information Systems – NIST SP800-53;The USA Health Insurance Portability and Accountability Act (HIPAA) Security Rule.

  • Interoperability – Canada Health Infoway pan-Canadian Standards and Conformance Profile Definitions for diagnostic imaging, laboratory, drug, shared health record, and demographic information.

  • Management – The IT Governance Institute Control Objectives for Information and Related Technology (COBIT); The Office of Government Commerce’s Information Technology Infrastructure Library (ITIL).


The following diagram from Infoway's web site depicts the high level architectural vision from an end user perspective (click to enlarge).

My Assessment

Overall, I am pleased with the choices that have been made by Infoway's Standard Collaborative. I believe that HL7 V3 is a step forward compared to HL7 v2.x because it is based on XML, it is more amenable to a Service-Oriented Architecture (SOA), and it defines a healthcare Reference Information Model (RIM) with associated modeling methodology and tooling. Beyond ICD-9 and CPT, SNOMED CT is definitively the medical terminology language of the future. Consistency is needed for units of measure for lab test results and UCUM is a good choice.

Some flexibility will be needed in standardizing transport protocols to allow lightweight solutions such as SMTP for point-to-point connections (what our friends in the US call NHIN Direct). The e-Health certification process should include some HIT usability testing (see my previous post on the experience of the British NHS in this area).

I am not aware of any pan-Canadian standard in the area of quality reporting. I would like to see Canada Health Infoway put more efforts into creating specifications (messaging, security, privacy) and open source tools for Health Information Exchanges (HIEs) at the provincial and federal levels. More work needs to be done in promoting the use of Computerized Physician Order Entry (CPOE) systems and Clinical Decision Support Systems (CDSS) for the automated execution of clinical guidelines. Finally, more guidance is needed in the area of patient consent in the context of electronic health information exchanges.

Sunday, August 1, 2010

Content Integration in the Aviation Industry Using CMIS

The recently approved Content Management Interoperability Services (CMIS) specification could play a very important role in ensuring that aircraft operators receive up-to-date maintenance and operation documentation from aviation manufacturers.

The safe and efficient maintenance and operation of air vehicles require clear, technically accurate, and up-to-date technical documentation. The technical documentation is supplied by original equipment manufacturers (OEMs), regulatory agencies, and the aircraft operator's own engineering staff. OEMs (e.g. airframe, engine, and component manufacturers) provide regular publications such as Aircraft Maintenance Manuals (AMM) and Flight Crew Operating Manuals (FCOM) as well as time-sensitive supplements such as Service Bulletins (SBs) and Temporary Revisions (TRs). Regulatory agencies like Transport Canada and the US Federal Aviation Administration (FAA) also publish technical information that affects the maintenance and operation of air vehicles and equipments. Examples are Advisory Circulars (ACs), Airworthiness Directives (ADs), and various forms and regulations.

A typical airline faces the following challenges:

  • The elimination of the high costs associated with the shipping, storage, and distribution of physical products (paper, CDs, and DVDs) containing the technical documentation.

  • The safety and regulatory compliance concerns related to the use of out-of-date technical information (currently, some airlines receive revisions to technical manuals only four times a year).

The aerospace industry is in the process of adopting the new S1000D technical publications standard. S1000D is based on the concepts of modularity, reuse, and metadata (see my post on S1000D Content Reuse). The Flight Operation Interests Group (FOIG) of the Air Transport Association (ATA) is developing a data model and XML Schema for flight deck procedures and checklists also based on the S1000D data model. While S1000D is the right payload format, the exchange between content management and publishing systems within the industry must be orchestrated in an efficient manner.

Airlines, repair stations, regulatory agencies, and original equipment manufacturers (OEMs) manage and publish technical content using proprietary content management systems (CMS) each with its own proprietary API. Some companies now provide online portals where customers can login to get the latest documentation. However, pilots and technicians don't really want to login into the support sites of all those content providers to find out what is new and updated. To minimize aircraft downtime, aircraft mechanics want to connect to the aircraft's health and usage monitoring system (HUMS), determine what problem needs to be fixed, and have the appropriate content aggregated (work package) and presented to them.

With CMIS, an airline or aircraft operator can create a portal to aggregate content from the repositories of its OEM suppliers using a single standardized web services interface based on either SOAP or AtomPub (the RESTful alternative). This allows the aircraft operator to keep their maintenance and operation documentation updated at all time without having to wait for a CD or paper manual to be shipped by the OEM.

The second scenario is distributed authoring driven by the shift to distributed aircraft manufacturing. For example, the content of the Aircraft Maintenance Manual (AMM) can be provided by different aviation manufacturers participating in a consortium to design, manufacture, and support a new aircraft. In such as a scenario, a centralized CMIS-compliant content repository (hosted by the airframe manufacturer acting as the content integrator) can provide the following CMIS services to other members of the consortium:

  • Policy and ACL Services to obtain the policies (such as retention) and Access Control List (ACL) currently applied to a document

  • Navigation Services to programmatically navigate the content repository

  • Discovery services to query content. CMIS supports SQL-92 with some extensions and full-text search and can handle federated search across multiple repositories

  • Relationship Services to obtain the relationships (such as links) associated with a document

  • Versioning Services to check-out and check-in a document

  • Object Services to obtain the properties of a document and create folders and documents

  • Filing Services to add a document to a folder.

The third example use case is the ability for a SCORM-compliant Learning Management System to integrate with CMIS-compliant S1000D Common Source DataBases (CSDB) in order to repurpose technical publications content for training purposes. The International S1000D-SCORM Bridge Project is an interesting initiative to create such an integration.

In general, CMIS will enable new capabilities such as the remote access to library services, cross-repository exchange, cross-repository aggregation, and cross-repository observation (or notification).

CMIS is now supported by major CMS vendors including EMC, IBM, Alfresco, and Microsoft. A list of open source and commercial implementations of CMIS is available at this page.

Saturday, July 24, 2010

Toward An Enterprise Reference Architecture for the Meaningful Use Era

Meaningful Use is not just about financial incentives. In fact, the financial incentives provided under the HITECH Act will only cover a portion of the total costs associated with migrating the healthcare industry to health information technologies. Consider the following Meaningful Use criteria:

  • Submit electronic data to immunization registries
  • Incorporate clinical lab-test results into certified EHR technology as structured data
  • Submit electronic syndromic surveillance data to public health agencies
  • Generate and transmit permissible prescriptions electronically (eRx)
  • Provide patients with an electronic copy of their health information
  • Exchange key clinical information among providers of care and patient authorized entities electronically.

These criteria are clearly designed to bridge information silos within and across organizational boundaries in the healthcare industry. This integration is necessary to improve the coordination and quality of care, eliminate the costs associated with redundancies and other inefficiencies within the healthcare system, empower patients with their electronic health records, and enable effective public health surveillance. This integration requires a new health information network such as those provided by state and regional Health Information Exchanges (HIEs) and the Nationwide Health Information Network (NHIN) initiative. HIEs and NHIN Exchange are based on a decentralized, service-oriented architecture (SOA) whereby authorized health enterprises securely exchange health information.

Health CIOs who see Meaningful Use as part of a larger transformational effort will drive their organizations toward success. Creating a coherent and consistent Enterprise Architecture for tackling these new challenges should be a top priority. Not having a coherent Enterprise Architecture will lead to a chaotic environment with increased costs and complexity. The following are some steps that can be taken to create an Enterprise Reference Architecture that is aligned with with the organization's business context, goals, and drivers such as Meaningful Use:

  1. Adopt a proven architecture development methodology such as TOGAF.

  2. Create an inventory of existing systems such as pharmacy, laboratory, radiology, patient administration, electronic medical records (EMRs), order entry, clinical decision support, etc. This exercise is necessary to gain an understanding of current functions, redundancies, and gaps.

  3. Create a target enterprise service inventory to eliminate functional redundancies and maximize service reuse and composability. These services can be layered into utility, entity, and task service layers. For example, Meaningful Use criteria such as "Reconcile Medication Lists", "Exchange Patient Summary Record", or "Submit Electronic Data to Immunization Registries" can be decomposed into two or more fine-grained services.

  4. Task services based on the composition of other reusable services can support the orchestration of clinical workflows. The latter previously required clinicians to log into multiple systems and re-enter data, or relied on point-to-point integration via the use of interface engines. These task services can also invoke services across organizational boundaries from other providers participating in a HIE in order to compose a longitudinal electronic health record (EHR) of the patient. Other services such as clinical decision support services or terminology services can be shared by multiple healthcare organizations to reduce costs.

  5. Consider wrapper services to encapsulate existing legacy applications.

  6. The Enterprise Reference Architecture should support standards that have been mandated in the Meaningful Use Final Rule, but also those required by NHIN Exchange, NHIN Direct, and the HIE in which the organization participates. These standards are related to data exchange (HITSP C32, CCR, HL7 2.x), messaging (SOAP, WSDL, REST, SMTP, WS-I Basic), security and privacy (WS-I Security, SAML, XACML), IHE Profiles (XDS, PIX, and PDQ), and various NHIN Exchange and NHIN Direct Specifications. While standards are not always perfect, healthcare interoperability is simply not possible without them.

  7. Select a SOA infrastructure (such as an Enterprise Service Bus or ESB) that supports the standards listed above. Consider both open source and commercial offerings.

  8. Consider non-functional requirements such as performance, scalability, and availability.

  9. The Enterprise Reference Architecture should also incorporate industry best practices in the areas of SOA, Security, Privacy, Data Modeling, and Usability. These best practices are captured in various specifications such as the HL7 EHR System Functional Model, the HL7 Service Aware Interoperability Framework (SAIF), and the HL7 Decision Support Service (DSS) specification.

  10. Finally, create a Governance Framework to establish and enforce enterprise-wide technology standards, design patterns, Naming and Design Rules (NDRs), policies, service metadata and their versioning, quality assurance, and Service Level Agreements (SLAs) such as the NHIN Data Use and Reciprocal Support Agreement (DURSA).

Monday, July 12, 2010

Adopting the NIEM for Health Information Exchange

A National Information Exchange Model (NIEM) Information Exchange Package Documentation (IEPD) for health information exchange is being considered by the U.S. Department of Health and Human Services Office of the National Coordinator (ONC). After years of work developing the HL7 CCD and HITSP C32, will a NIEM IEPD represent a step forward in health data exchange standards? Will the NIEM IEPD process work in the healthcare domain? In adopting the NIEM process, can we leverage the work that has been done so far by organizations such as HITSP, HL7, and the ASTM?


Having previously used the NIEM standard in a project, my first reaction when I started to work with the HL7 CDA and the related Reference Information Model (RIM) refinement process was, "why not use the NIEM?". The NIEM approach is proven and has been successfully used in complex data exchange scenarios in criminal justice and other sensitive domains.

The HL7 RIM expressed as a set of UML class diagrams defines a foundation model for health care and clinical data. The HL7 Clinical Document Architecture (CDA) is derived from the HL7 RIM through a refinement process that generates a Refined Message Information Model (R-MIM) for the CDA. The HL7 Continuity of Care Document (CCD) and HITSP C32 specifications define additional constraints on the HL7 CDA. Although the HL7 CCD was in fact an original attempt to harmonize the HL7 CDA and the ASTM CCR, there is still a need to define a unified data model for health information exchange. The Meaningful Use Interim Final Rule (IFR) allows both the CCD and the CCR for data exchange.

Ease of Use

From a software development point of view, the HL7 CCD and HITSP C32 standards are complex to learn and use. Data exchange standards don't provide any value by themselves. They provide value when they are used to create software solutions that improve the quality and safety of care and reduce costs. Therefore, ease of use represents an important requirement for a health data exchange standard. How do you create an XML data exchange standard that developers can learn quickly and that is easy to use?

Something as simple as adopting a component naming convention that conveys the semantics of those components is a good start. For example, ISO 11179, Part 5 entitled "Naming and Identification Principles" defines the following convention for naming data components:

  • Object-class qualifier terms (0 or more).
  • An object class term (1).
  • Property qualifier terms (0 or more).
  • A property term (1).
  • Representation qualifier terms (0 or more).
  • A representation term (1).

An example of an element name that follows that recommendation is <PersonBirthLocation>. Not having a component naming convention or using cryptic names make it hard for developers to understand a new XML vocabulary. Disk space is cheap these days, so there is no reason to keep using those cryptic names.

Defining components is also important. ISO 11179, Part 4 entitled "Formulation of Data Definitions" provides the following recommendations for defining data components:

  1. state the essential meaning of the concept
  2. be precise and unambiguous
  3. be concise
  4. be able to stand alone
  5. be expressed without embedding rationale, functional usage, or procedural information
  6. avoid circular reasoning
  7. use the same terminology and consistent logical structure for related definitions
  8. be appropriate for the type of metadata item being defined

In general, consistency based on a predefined XML Schema Naming and Design Rules (NDRs) document which is itself based on recognized XML Schema design patterns can contribute to ease of use.

Developing Software

Key to adoption is the ability for an average software developer to use familiar tools including existing open source software to generate and process instances of the exchange XML schema. Examples of these tools include:

  • XML databinding tools such as JAXB
  • XML APIs such as JDOM or DOM4J
  • Mapping and storing the data in existing relational databases (this is what the majority of developers know)
  • Web Services Framework such as Apache CXF.

Reference Implementation and Testing Ease of Use

Creating a reference implementation with existing open source tools and testing ease of use should be part of releasing new health data exchange standards.

The reference implementation should support a complete SOA-based exchange scenario and provide artifacts that are typically generated in a real web services project. This includes designing a data model for storage, WSDLs, precompiling the XML schema with a databinding tool such as JAXB, and using an open source web services tool such as Apache CXF.

One way to determine ease of use is to ask average developers to process instances of the XML schema using familiar XML APIs and then measure how long it took them to write the code as well as the Cyclomatic Complexity (and other code quality metrics) of the written code.

Benefits of Adopting NIEM in the Healthcare Domain

I believe that NIEM is a good choice for moving healthcare interoperability forward for the following reasons:

  • The NIEM is a proven data exchange standard that has been used in mission critical environments such as the justice and intelligence domains.
  • The NIEM embodies recognized XML Schema design patterns in its Naming and Design Rules (NDR). For example, the NIEM component naming convention is based on ISO 11179 Parts 4 and 5. Per the NIEM NDR, all schema type definitions must be global to enable reuse. The NIEM provides a schematron-based tool to automatically validate XML schemas against the rules defined in the NDR.
  • The NIEM provides core components that can be used or extended for the healthcare domain. Core NIEM data components such as <Person>, <Organization>, and <Address> are universal components that are needed in healthcare as well. NIEM also specifies an elegant solution for modeling roles, associations between entities, and code lists.
  • The NIEM provides various tools to facilitate the IEPD process. These tools include the Schema Subset Generation Tool (SSGT), the NIEM Wayfarer, and conformance validation tools. The SSGT presents a shopping cart metaphor for assembling existing components into new exchange packages. The ONC anticipates the need for additional tools for the healthcare domain.
  • A companion specification called the Logical Entity Exchange Specification (LEXS) can be used to define a messaging model for web services. LEXS has its own set of tools.

A NIEM Concept of Operations for the Healthcare Domain

The diagram below is extracted from a presentation made at the HIT Standards Committee meeting on June 30, 2010 and entitled: Standards & Interoperability Framework Concept of Operations (ConOps) Overview (click to enlarge).

NIEM Healthcare IEPD vs. NIEM Healthcare Domain

My personal recommendation is that the new harmonization structure (the successor to HITSP) should create a NIEM Healthcare domain as opposed to a NIEM IEPD for the healthcare domain. The healthcare domain is complex enough to warrant its own domain within the NIEM. A NIEM Healthcare domain should provide a data exchange standard for Electronic Health Records (EHRs) and leverage the work done by HITSP C32, C80, and C83 in defining required data elements and code lists (terminology) for various EHR content modules. Instead of mapping those data elements to the HL7 CCD XML schema as was done by HITSP, those elements will be mapped to existing, extended, or new NIEM components. Specific health data exchanges can then create IEPDs to satisfy their unique requirements by reusing data components from the NIEM.

It should be possible to map the CDA RMIM into the NIEM model using the NIEM Component Mapping Tool or CMT. It should also be possible to leverage at least some components from the HL7 CCD XML schema in building a NIEM Healthcare domain or IEPD. However, these components will have to be renamed and restructured to comply with the NIEM naming and design rules.

Thursday, July 1, 2010

Patient Consent: State of the Union

The viability and sustainability of an electronic health information exchange (HIE) largely depends on the trust and desire of patients to participate in the exchange. What are the laws, regulations, standards, practices, and current directions in the field of patients' privacy consents?

State and Federal Laws

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) does not require patient consent and authorization for the exchange of health information among healthcare providers for the purpose of medical care. The patient consent is implied by her general consent to be treated.

Some states have adopted heightened laws (higher than those imposed by HIPAA) that require explicit patient consent and authorization for specially-protected health information such as sexually transmitted diseases, human immunodeficiency virus tests, viral hepatitis, genetic information, substance abuse, mental health, and developmental disabilities.

Building Trust

The lack of patient trust can be a significant barrier to the implementation of a HIE. Therefore, a common practice in HIEs is to offer individual patients the opportunity to opt-out of exchanging their health information even if patient consent is not required by existing laws and regulations. Patients are notified and informed of their consent options through an outreach program.

Consent Requirements

The HHS Office of the National Coordinator (ONC) releases a Consumer Preferences Requirements Document in October 2009. The document describes consent stakeholders, functional needs, policy implications, scenarios, and processes including HIEs.

Consent Options

The ONC released a whitepaper in March 2010 entitled "Consumer Consent Options for Electronic Health Information Exchange: Policy Considerations and Analysis". The whitepaper identified the following consent options:

  • No consent. Health information of patients is automatically included—patients cannot opt out;

  • Opt-out. Default is for health information of patients to be included automatically, but the patient can opt out completely;

  • Opt-out with exceptions. Default is for health information of patients to be included, but the patient can opt out completely or allow only select data to be included;

  • Opt-in. Default is that no patient health information is included; patients must actively express consent to be included, but if they do so then their information must be all in or all out; and

  • Opt-in with restrictions. Default is that no patient health information is made available, but the patient may allow a subset of select data to be included.

The granularity of patient consent preference can be based on the type of data, healthcare provider, time range, and intended use.


The IHE Basic Patient Privacy Consents (BPPC) provides a mechanism to record and enforce patient consents. It complements the IHE Cross-Enterprise Document Sharing (XDS) standard by specifying how an XDS affinity domain can create privacy policies and enforce those policies through the access control mechanisms of an XDS Actor such as an EHR system. Patient consent is captured in an HL7 Clinical Document Architecture (CDA) document using a scanned document (wet signature on paper) or a digital signature.

HL7 is working on a set of specifications for the creation and use of privacy consent directives.

The Nationwide Health Information Network (NHIN) Access Consent Policies Specification is based on the eXtended Access Control Markup Language (XACML), an OASIS standard.

Health Record Banks (HRBs)

An emerging pattern is to include a Health Record Bank (HRB) containing Personal Health Records (PHRs) as a participating node in the HIE. The HRB is accessible to patients via a web portal and allows patients to exercise fine-grained control over the exchange of their health records within the HIE through a consent configuration interface.

UPDATE: On Thursday, July 8, 2010, the Department of Health and Human Services (HHS) announced proposed modifications to the HIPAA Privacy & Security Rules.

Tuesday, June 22, 2010

Health Information Exchanges (HIEs): Emerging Architectural Patterns

The following are some emerging architectural patterns in the nascent field of HIEs:

  • Decentralized and Hybrid Models utilizing a service-oriented architecture (SOA). A centralized registry stores metadata about the type and location of clinical data available in edge systems connected to the HIE. For privacy and security reasons, the clinical data itself is kept at its source as opposed to a centralized repository. Upon request, a Record Locator Service (RLS) finds the data in edge systems and securely routes the data to authorized requestors.

  • A Data Use and Reciprocal Support Agreement (DURSA) provides the legal framework for participation in the HIE.

  • Use of a master patient index (MPI) as a core infrastructure for patient matching. The MPI stores identifying information on all patients with records in participating systems.

  • Use of Integrating the Health Enterprise (IHE) profiles such as Patient Identity Cross-Reference (PIX) and Cross Enterprise Document Exchange (XSD.b) to facilitate patient discovery and the query and retrieval of clinical documents.

  • Health Information Event Messaging to provide the ability to subscribe to health information.

  • Interoperability with NHIN Exchange to enable exchange of data with other state and federal agencies such as the Department of Veterans Affairs (VA) and the Department of Defense (DoD). This requires support for NHIN messaging standards such as the Web Services Interoperability (WS-I) Profiles WS-I Basic v 2.0 and WS-I Security v 1.1. WS-I Basic specifies the use of SOAP 1.2, WSDL 1.1, WS-Addressing 1.0, WS-Policy 1.5 , MTOM 1.0 , and UDDI v3.0.2 for the NHIN Web Services Registry. WS-I Security defines the security standards for NHIN Exchange including TLS, AES 128, X.509, XML D-Sig, and Attachment Security. NHIN has adopted an authentication and authorization framework based on SAML 2.0.

  • Use of the HL7 Continuity of Care Documents (CCD) as the data exchange standards for clinical documents. Meaningful Use criteria allow the ASTM CCR specification as well.

  • Health Record Banks (HRBs) containing Personal Health Records (PHRs) as participating nodes in the HIE. The HRBs allow patients to exercise control over their health records by granting permissions to specific providers to view those health records.

  • Ability to connect to the HIE through a local EMR or a web-based portal (for example to allow access for physicians without an EMR).

  • For simple and secure interoperability, the NHIN Direct draft proposal at the time of this writing is to use:

    • SMTP as a backbone protocol
    • S/MIME-signed and encrypted messages for security
    • IHE XDM for content and metadata packaging
    • IHE XDR, REST, and Email (POP/IMAP) as edge protocols
    • TLS (with a server certificate only) for on-the-wire security
    • XDR as the backbone for NHIN Exchange.

Saturday, June 12, 2010

Putting XQuery to Work in Healthcare

The following are some of the challenges that healthcare organizations will be facing during the next few years:

  • Conversion from HIPAA 4010 to 5010
  • Conversion from ICD-9 to ICD-10
  • Efficiently storing, querying, processing, and exchanging Electronic Health Records (EHRs)
  • Mapping from HL7 2.x to HL7 v3 messages
  • Assembling EHRs by aggregating data from multiple organizations participating in Health Information Exchanges (HIEs).

XQuery is not just a query language for XML data sources. It is also a very powerful declarative, strongly typed, and side-effect free programming language for processing and manipulating XML documents. XQuery is a natural solution for querying and aggregating data coming from heterogeneous sources such as relational databases, native XML databases, file systems, and legacy data formats such as EDI. Some developers will find XQuery easier to use than XSLT because XQuery has a SQL-like syntax.

Migration to HIPAA 5010 and ICD 10

Conversion from HIPAA 4010 to 5010 and ICD-9 to ICD-10 will be a priority on the agenda in the next three years (details on final compliance dates can be found on this HHS web page).

The XQuery and XQuery Update Facility specifications provide a simple and elegant solution to this conversion challenge.

Health Information Exchanges (HIEs) and the Virtual Health Record

In a HIE with multiple participating organizations, EHR data must be assembled either through a centralized, federated, or hybrid data model. The data needed to assemble a longitudinal EHR (a virtual health record) for a patient could be coming from several providers, payers, lab companies, and medical devices. XQuery was designed to handle that type of XML processing use case.

Storing, Updating, and Querying EHRs

The HL7 CCD and ASTM CCR have been retained as Meaningful Use XML data exchange standards for EHRs. Mapping between an XML HL7 CCD representation (which is derived from the HL7 UML-based Reference Information Model or RIM) and an existing relational database structure is not trivial. IBM has been granted a patent entitled "Conversion of hierarchically-structured HL7 specifications to relational databases". The HL7 RIMBAA project provides some best practices on mapping RIM objects to a relational database structure.

With the emergence of native XML databases such as Oracle XML DB and IBM pureXML, XML is no longer just a messaging format. It can be used as a format for storing and querying data as well.

This article shows a sample code of updating an EHR stored in HL7 CDA format in an IBM DB2 pureXML native XML database.

Mapping from HL7 2.x to HL7 v3 messages

In countries like Canada where HL7 v3 has been adopted, a frequent challenge is to map from legacy HL7 2.x messages to HL7 v3 messages for example for lab results. An XQuery-based transform can be used to map from an HL7 v2.x XML structure to an HL7 v3 XML structure.

An Alternative to GELLO?

In a previous post entitled "Clinical Decision Support: Crossing the Chasm", I argued that Clinical Decision Support Systems (CDSS) implementers should be free to use any programming language of their choice. GELLO is an HL7 standard which specifies an expression and query language for CDSS. The following are the requirements for a CDSS expression and query language as specified in the GELLO specification:

  • vendor-independent
  • platform-independent
  • object-oriented and compatible with the vMR
  • easy to read/write
  • side-effect free
  • flexible
  • extensible

XQuery satisfies all these requirements except the third. XQuery is a functional programming language with no side effect as opposed to an object-oriented programming language. GELLO settled on the OMG Object Constraint Language (OCL). The following paragraph from the GELLO specification explains why XQuery (known as XQL at the time) wasn't selected:

XQL is a query language designed specifically for XML documents. XML documents are unordered, labeled trees, with nodes representing the document entity, elements, attributes, processing instructions and comments. The implied data model behind XML neither matches that of a relational data model nor that of an object-oriented data model. XQL is a query language for XML in the same sense as SQL is a query language for relational tables. Since the HL7 RIM data model and the vMR data model are both object-oriented, it is clear that XQL is not an appropriate approach for an object-oriented query and expression language.

That might have been true back in 2004 in an object-oriented world. Today, if the inputs to a CDSS are EHRs represented in HL7 CCD or ASTM CCR format, and those EHRs are stored in an XQuery compliant native XML database, then XQuery could be a strong candidate for an expression and query language for the CDSS.

Wednesday, June 9, 2010

Data Modeling for Electronic Health Records (EHR) Systems

Getting the data model right is of paramount importance for an Electronic Health Records (EHR) system. The factors that drive the data model include but are not limited to:

  • Patient safety
  • Support for clinical workflows
  • Different uses of the data such as input to clinical decision support systems
  • Reporting and analytics
  • Regulatory requirements such as Meaningful Use criteria.

Model First

Proven methodologies like contract-first web service design and model driven development (MDD) put the emphasis on deriving application code from the data model and not the other way around. Thousands of line of code can be auto-generated from the model, so it's important to get the model right.

Requirements Gathering

The objective here is to determine the entities, their attributes, and the relationships between those entities. For example, what are the attributes that are necessary to describe a patient's condition and how do you express the fact that a condition is a manifestation of an allergy? The data modeler should work closely with clinicians to gather those requirements. Industry standards should be leveraged as well. For example, HITSP C32 defines the data elements for each EHR data module such as conditions, medications, allergies, and lab results. These data elements are then mapped to the HL7 Continuity of Care Document (CCD) XML schema.

The HL7 CCD is itself derived from the HL7 Reference Information Model (RIM). The latter is expressed as a set of UML class diagrams and is the foundation model for health care and clinical data. A simpler alternative to the CCD is the ASTM Continuity of Care Records (CCR). Both the CCD and CCR provide an XML schema for data exchange and are Meaningful Use criteria. Another relevant data model is the HL7 vMR (Virtual Medical Record) which aims to define a data model for the input and output of Clinical Decision Support Systems (CDSS).

These standards can be cumbersome to use as such from a software development perspective. Nonetheless, they can inform the design of the data model for an EHR system. Alignment with the CCD and CCR will facilitate data exchange with other providers and organizations. The following are Meaningful Use criteria for data exchange:

  1. Electronically receive a patient summary record, from other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures and upon receipt of a patient summary record formatted in an alternative standard specified in Table 2A row 1, displaying it in human readable format.

  2. Enable a user to electronically transmit a patient summary record to other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures in accordance with the standards specified in Table 2A row 1.

Applying Data Modeling Patterns

Applying data modeling patterns allows model consistency and quality. Relational data modeling is a well established discipline. My favorite resource for relational data modeling patterns is: The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling.

Some XML Schema best practices can be found here.

Data Stores

Today, options for data store are no longer limited to relational databases. Alternatives include: native XML databases (e.g. DB2 pureXML), Entity-Attribute-Value with Classes and Relationships (EAV/CR), and Resource Description Framework (RDF) stores.

Native XML databases are more resilient to schema changes and do not require handling the impedance mismatch between XML documents, Java objects, and relational tables which can introduce design complexity, performance, and maintainability issues.

Storing EHRs in an RDF store can enable the inference of medical facts based on existing explicit medical facts. Such inferences can be driven by an ontology expressed in OWL or a set of rules expressed in a rule language such SWRL. Semantic Web technologies can also be helpful in checking the consistency of a model, data and knowledge integration across domains (e.g. the genomics and clinical domains), and for managing classification schemes like medical terminologies. RDF, OWL, and SWRL have been successfully implemented in Clinical Decision Support Systems (CDSS).

The data modeling notation used should be independent of the storage model or at least compatible with the latter. For example, if native XML storage is used, then a relational modeling notation might not be appropriate. In general, UML provides the right level of abstraction for implementation-agnostic modeling.

Due Diligence

When adopting a "noSQL" storage model, it is important to ensure that (a) the database can meet performance and scalability criteria and (b) the team has the skills to develop and maintain the database. Due diligence should be performed through benchmarking using a tool such as the IBM Transaction Processing over XML (TpoX). The team might need formal training in a new query language like XQuery or SPARQL.

A Longitudinal View of the Patient Health

Maintaining an up-to-date and truly longitudinal view of a patient's medical history requires merging and reconciling data from heterogeneous sources including providers' EMR systems, lab companies, medical devices, and payers' claim transaction repositories. The data model should facilitate the assembly of data from such diverse sources. XML tools based on XSLT, XQuery, or XQuery Update can be used to automate the merging.

The Importance of Data Validation

Data validation can be performed at the database layer, the application layer, and the UI layer. The data model should support the validation of the data. The following are examples of techniques that can be used for data validation:

  • XML Schema for structural validation of XML documents
  • ISO Schematron (based on XPath 2.0 and XSLT 2.0) for business rules validation of XML documents
  • A business rules engine like Drools
  • A data processing framework like Smooks
  • The validation features of a UI framework such as JSF2
  • The built-in validation features of the database.

The Future: Modeling with the NIEM IEPD

The HHS ONC issued an RFP for using the National Information Exchange Model (NIEM) Information Exchange Package Documentation (IEPD) process for healthcare data exchange. The ONC will release a NIEM Concept of Operations (ConOps). The NIEM IEPD process is explained here.

Tuesday, May 25, 2010

Architecting the Health Enterprise with TOGAF 9

Several factors are currently driving the increased complexity of health information technology (HIT). These factors include: a new regulatory framework, innovations in the practice of healthcare delivery, standardization, cross-enterprise integration, usability, mobility, security, privacy, and the imperative to improve care quality and reduce costs.

A methodology and governance framework is needed for creating a coherent and consistent enterprise architecture (EA). The latter should not be driven by vendors and their offerings. Instead, health enterprises should develop an EA that is aligned with their unique overarching business context, drivers, and vision. Developing an architecture capability that is based on a proven framework should be a top priority for health IT leaders.

TOGAF 9 is an Open Group standard that defines a methodology, standardized semantics, and processes that can be used by Enterprise Architects to align IT with the strategic goals of their organization. TOGAF 9 covers the following four architecture domains:

  • Business Architecture
  • Data Architecture
  • Application Architecture
  • Technology Architecture.

The diagram below from the TOGAF 9 documentation provides an overview (click on the image to enlarge).

The Architecture Development Method (ADM) is the core of TOGAF and describes a method for developing an enterprise architecture. TOGAF 9 includes specific guidance on how the ADM can be applied to service-oriented architecture (SOA) and enterprise security (two areas of interest in health IT). The different phases of the ADM are depicted on the following diagram (click on the image to enlarge).

The Architecture Capability Framework provides guidelines and resources for establishing an architecture capability within the enterprise. This capability operates the ADM. The Content Framework specifies the artifacts and deliverables for each Architecture Building Block (ABB). These artifacts are stored in a repository and classified according to the Enterprise Continuum.

The Open Group has been working on the adoption of an open EA modeling standard called ArchiMate. ArchiMate provides a higher level view of EA when compared to modeling standards such as BPMN and UML. It can be used to depict different layers of EA including business processes, applications, and technology in a way that can be consumed by non-technical business stakeholders. A sample of an ArchiMate enterprise view of a hospital can be found here.

HL7 has published the Services-Aware Interoperability Framework (SAIF), an architectural framework for facilitating interoperability between healthcare systems. SAIF includes the following four components: the Enterprise Conformance and Compliance Framework (ECCF), the Governance Framework (GF), the Behavioral Framework (BF), and the Information Framework (IF).

For guidance on using SOA in healthcare, the Healthcare Services Specification Project (HSSP) has published the Practical Guide for SOA in Healthcare based on the TOGAF Architecture Development Method (ADM) and the SAIF ECCF. The Practical Guide for SOA in Healthcare contains a sample Reference Enterprise Architecture. The Practical Guide for SOA in Healthcare Volume II describes an immunization case study.

Also noteworthy is the HL7 EHR System Functional Model (EHR-S FM) and the HSSP Electronic Health Record (EHR) System Design Reference Model (EHR SD RM).

Saturday, May 22, 2010

Clinical Decision Support: Crossing The Chasm

Clinical Decision Support (CDS) is certainly a "meaningful use" of electronic health records (EHRs). Despite its potential to improve the quality of care, CDS is not widely used in health care delivery today. In tech marketing parlance, CDS has not crossed the chasm. There are several issues that need to be addressed including: physicians buy-in into the concept of automated execution of evidence-based clinical guidelines, seamless integration into clinical workflows, usability, standardization, and CDS software architecture and design.

When I first started to explore CDS systems, I was quite overwhelmed by the number of different competing formalisms, standards, and academic projects in the field. To move us past the current gridlock in CDS adoption, I propose an agile approach to the development of CDS software with an emphasis on:

  • Working CDS software that delivers results for providers and their patients
  • Close daily collaboration between clinicians and software developers during the development process
  • The use of agile techniques such as automated acceptance testing to facilitate the involvement of clinicians in CDS software quality assurance.

Working CDS Software

Different formalisms, methodologies, and architectures have been proposed for representing the medical knowledge in clinical guidelines for their automated execution. Examples include, but are not limited to the following:

  • The Arden Syntax
  • GLIF (Guideline Interchange Format)
  • GELLO (Guideline Expression Language Object-Oriented)
  • GEM (Guidelines Element Model)
  • PROforma
  • EON
  • Asbru
  • SAGE
  • The HL7 Decision Support Service (DSS) Functional Model Specification.

Many academic papers have been written to explain and compare these different approaches. Each of these projects represents an important contribution to the CDS field and will inform the design of future CDS software. However, it is quite easy to fall into analysis paralysis when reviewing and debating which formalism or standard is better. At the end of the day, what really matters to the pragmatic developer is tested and working CDS software that delivers results for clinicians and their patients.

Using business rule engines is not the only way to develop CDS software. However, I believe that because they are written in languages and frameworks that are accessible to mainstream software developers, business rule engines can accelerate the development, deployment, and availability of CDS software. Furthermore, many viable open source business rule engines are available today and can be leveraged.

Getting CDS Done with a Business Rule Engine

ARRA interim certification criteria for electronic health record (EHR) technology include the following CDS-related requirements:

  1. Implement automated, electronic clinical decision support rules (in addition to drug-drug and drug-allergy contraindication checking) according to speciality or clinical priorities that use demographic data, specific patient diagnoses, conditions, diagnostic test results and/or patient medication list.
  2. Automatically and electronically generate and indicate (e.g., pop-up message or sound) in real time, alerts and care suggestions based upon clinical decision support rules and evidence grade.
  3. Automatically and electronically track, record, and generate reports on the number of alerts responded to by a user.

These requirements can be satisfied with simple conditional statements in any programming language. However, it is recognized that clinical decision support rules (like other complex types of business rules) are better implemented with a business rule engine. This allows the developer to externalize the medical knowledge in the clinical guideline in the form of declarative rules as opposed to embedding that knowledge in procedural code. This approach has many benefits notably resilience to change, ease of maintenance, and enabling collaboration with business users (in this case clinicians).

Collaboration between Clinicians and Software Developers

One of the most challenging aspects of CDS has been the translation of the medical knowledge in clinical guidelines into executable code. There are not that many people who are expert in the both the medical and software development fields. The Agile prescription to this problem is close and daily collaboration between clinicians and software developers.

Business Rules engines like JBoss Drools provide features such as Excel-based decision tables or the ability to write rules in a DSL (domain specific language) to allow clinicians to actively participate in the development and maintenance of decision support rules.

Automated Acceptance Testing

The quality of CDS software is of paramount importance for care safety reasons. The Agile prescription here is test-driven development (TDD), particularly the automated integration and acceptance testing of the proper execution of clinical decision support rules. "FIT for Rules" is an example of an automated acceptance framework for rule engines like ILog and Drools. Such frameworks allow both the developer and the clinician to participate in the acceptance testing process.

Service-Oriented CDS

The complexity and cost of developing CDS software strongly argue in favor of a service-oriented approach whereby CDS software capabilities are exposed as a set of services that can be consumed by other client health IT systems such as EHR and Computerized Physician Order Entry (CPOE) systems. To reduce costs, these CDS software services can be shared by several health care providers.

In this regard, the HL7 Decision Support Service (DSS) Functional Model Specification represents one of the most important specifications for CDS implementers today.

Interchange Standard

The complexity and cost inherent in capturing the medical knowledge in clinical guidelines and translating that knowledge into executable code remains an impediment to the widespread adoption of CDS software. Therefore, there is still a need for a standard for the sharing and interchange of executable clinical guidelines. Several formalisms and standards have been proposed such as the Arden Syntax, GLIF, GELLO, and GEM. However, none of these standards has been widely adopted. Although there is a lot that can be learned from these standards, I believe that they are not widely used because they are complex and specific to the healthcare domain, and therefore not accessible to mainstream developers. There is also a lack of open source and even commercial implementations of some of these standards.

If business rule engines are the pragmatic path to CDS adoption, then I would argue that the Rule Interchange Format (RIF) specification might be a solution to the interchange problem. The RIF Production Rule Dialect (PRD) is designed as a common XML serialization for multiple rule languages to enable rule interchange between different business rule management systems (BRMS). RIF is currently a W3C candidate recommendation and is backed by several BRMS vendors.

UPDATE: The following is the Final Meaningful Use criteria for Clinical Decision Support (CDS):

  1. Implement rules. Implement automated, electronic clinical decision support rules (in addition to drug-drug and drug-allergy contraindication checking) based on the data elements included in: problem list; medication list; demographics; and laboratory test results.

  2. Notifications. Automatically and electronically generate and indicate in real-time, notifications and care suggestions based upon clinical decision support rules.

And here are the draft stage 2 requirements:

Use CDS to improve performance on high- priority health conditions.

Establish CDS attributes for purposes of certification:

  1. Authenticated (source cited);
  2. Credible, evidence-based;
  3. Patient-context sensitive;
  4. Invokes relevant knowledge;
  5. Timely;
  6. Efficient workflow;
  7. Integrated with EHR;
  8. Presented to the appropriate party who can take action