Sunday, December 19, 2010

How Not to Build A Big Ball of Mud

If you are a software developer or architect, in addition to the ever changing business requirements, you also need to deal with the myriad of application development frameworks and design patterns out there. There are frameworks for: the User Interface (UI), Dependency Injection (DI), Aspect Oriented Programming (AOP), and Object-Relational Mapping (ORM). On top of that, you will probably need a web services framework and perhaps an Enterprise Service Bus (ESB) if you need to integrate applications. As an architect, you also need to keep an eye on scalability, availability, security, usability, industry standards, and government regulations.

In such an environment, the lack of a disciplined approach to software architecture can quickly lead to a Big Ball of Mud. In a paper presented in 1997 at the Fourth Conference on Patterns Languages of Programs, Brian Foote and Joseh Yoder describe the Big Ball of Mud:

A BIG BALL OF MUD is haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle. We’ve all seen them. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.


The Big Ball of Mud remains the most pervasive architecture today. Note that these problems can be exacerbated by an agile software development approach that leaves little or no room for design and strategic thinking (see my previous post on software architecture documentation in agile projects).

Domain Driven Design (DDD) is a set of patterns that have been introduced by Eric Evans in his book entitled: "Domain-Driven Design: Tackling Complexity in the Heart of Software". I won't go into the details of what those patterns are. I do recommend that you read the book and there are other free DDD resources on the web as well. However, I will share with you some key DDD principles that have been helpful to me in wrapping my head around software architecture complexity:

  • Collaboration between software developers and domain experts is important to create a common understanding of the concepts of the domain. Note that we're not talking about UI components such as screens or fields here, nor are we talking about computer science abstractions such as classes and objects. We are talking about what the domain is made of conceptually. These domain concepts are expressed in a Ubiquitous Language that is used not only in conversations and software documentation, but also in the code. So in essence, DDD is model-driven design: the model can be translated into code and frameworks like Naked Objects can help you do exactly that. Continuously refine the model.

  • Experimentation and rapid prototyping are an efficient way to collaborate with domain experts and business analysts. This is where the Naked Objects Framework can be very helpful. Note that you can use Naked Objects for the initial prototyping and domain modeling, and then use your preferred frameworks for the remaining layers of your application.

  • Pay attention to the correct implementation of DDD building blocks such as: entities (behaviour-rich with business rules), value objects, aggregates, aggregate roots, domain events, factories, repositories, and services. Avoid an anemic domain model and know how to recognize value objects and persist them properly. Value objects (such as money and time interval) are immutable and are manipulated through side-effect free functions. Move complexity and behaviour out of your entities into those value objects.

  • Domain concepts are grouped into modules to delineate what Eric Evans calls the "conceptual contours" of the domain. To reduce coupling, OO patterns such as the dependency inversion principle, the interface segregation principle, and the acyclic dependency principles are applied.

  • DDD recommends the following four layers: the presentation layer, the application layer, the domain layer, and the infrastructure layer. Although the idea of a multi-tier architecture is not new, the anti-patterns are typically: a fat application layer, an anemic domain model, and a tangled mess in general. So, properly layer your architecture:

    • Repository interfaces are in the domain layer, but their implementation are in the infrastructure layer to allow "Persistence Ignorance"
    • Both the interface and implementation of factories are in the domain layer
    • Domain and infrastructure services are injected into entities using dependency injection (some argue that DDD is not possible without DI, AOP, and ORM)
    • The application layer takes care of cross-cutting concerns such as transactions and security. It can also mediate between the presentation layer and domain layer through Data Transfer Objects (DTOs).


  • DDD enables Object Oriented User Interfaces (OOUI) which expose the richness of the domain layer as opposed to obscuring it.

  • Models exist within bounded contexts and the latter should be clearly identified. In his book, Eric Evans talks about "strategic design" and "context maps" and suggests the following options for integrating applications:

    • Published language
    • Open host service
    • Shared kernel
    • Customer/supplier
    • Conformist
    • Anti-corruption layer
    • Separate ways.


    In industries such as healthcare where an XML-based data exchange standard exists, the "Published Language" approach is the pattern typically used. Each healthcare application participating in an exchange represents a separate context. On the other hand, an "Anti-Corruption Layer" can be created as an adapter to isolate the model against an industry standard model that is not considered best practice in data modeling, is inconsistent, immature, or subject to change. However, since there is tremendous value in exchanging data, we hope not to go "Separate Ways".

  • DDD is a solid foundation for next-generation architecture based on the Command Query Responsibility Segregation (CQRS) pattern. The UI sends commands which are handled by command handlers. These command handlers change the state of aggregate roots. However, the aggregate roots still define behavior and business rules and are responsible for maintaining invariants. Changes to aggregate roots generate events that are stored in an event store (this is called event sourcing). Aggregate roots are persisted by storing these streams of events in the event store. That way, the aggregate roots can be reconstructed by replaying the events from the event store. The events are published to subscribers including denormalizers for enhanced query performance. The separation of the read side from the write side allows:

    • Increased performance and scalability on the read and reporting side particularly when combined with a cloud deployment model
    • Complete audit trails through the event store
    • New data mining capabilities leveraging temporal queries
    • The opportunity to eliminate the need for Object Relational Mapping (ORM) through the use of high performance NoSQL databases.

Monday, December 13, 2010

Toward a Universal Exchange Language for Healthcare

The US President's Council of Advisors on Science and Technology (PCAST) published a report last week entitled: "Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward". The report calls for a universal exchange language for healthcare (abbreviated as UELH in this post). Specifically, the report says:

"We believe that the natural syntax for such a universal exchange language will be some kind of exten­sible markup language (an XML variant, for example) capable of exchanging data from an unspecified number of (not necessarily harmonized) semantic realms. Such languages are structured as individual data elements, together with metadata that provide an annotation for each data element."

First, let me say that I fully support the idea of a UELH. I've written in the past about the future of healthcare data exchange standards. The ASTM CCR and the HL7 CCD have been adopted for Meaningful Use Stage 1 and that was the right choice. In my opinion, the UELH proposed by PCAST is about the next generation healthcare data exchange standard that is yet to be built. It's part of the natural evolution and innovation that are inherent to the information technology industry. It is also a very challenging task that should be informed by the important work that has been done previously in this field including:

  • The ASTM CCR
  • The HL7 RIM, CDA, CCD, and greenCDA
  • Archetype-based EN 13606 from OpenEHR
  • The National Information Exchange Model (NIEM)
  • HITSP C32
  • Biomedical Ontologies using semantic web technologies such as OWL2, SKOS, and RDF.
  • Medical Terminologies such as SNOMED and RxNorm.

This new language should focus on identifying, addressing, and solving the issues with the use of the current set of healthcare data exchange standards. This will require a public discourse that is cordial and focused on solutions and innovative ideas. Most importantly, it will require listening to the concerns of implementers. This proposal should not be about reinventing the wheel. It should be about creating a better future by learning lessons from the past while being open-minded about new ideas and approaches to solving problems.

Note that the report talks about the syntax of this new language as some kind of an "XML variant". It also mentioned that the language must be exten­sible. This is important in order to enable innovation in this field. For example, we've recently seen a serious challenge to XML coming from JSON in the web APIs space (Twitter and Foursquare removed support for XML in their APIs and now only provide a JSON API). Similarly, in the Semantic Web space, alternatives to the RDF/XML serialization syntax have emerged such as the N-triples notation. This is not to say that XML is the wrong representation for healthcare data. It simply means that we should be open to innovation in this area.

Metadata and the Semantic Web in Healthcare

Closely related to the notion of metadata is the idea of the Semantic Web. Although semantic web technologies are not widely used in healthcare today, they could help address some of the issues with current healthcare standard information models including: model consistency, reasoning, and knowledge integration across domains (e.g. the genomics and clinical domains). In a report entitled "Semantic Interoperability Deployment and Research Roadmap", Alan Rector, an authority in the field of biomedical ontologies, explains the difference between ontologies and data structures:

A second closely related notion is that of an "information model" of "model of data structures". Both Archetypes and HL7 V3 Messages are examples of data structures. Formalisms for data structures bear many resemblances to formalisms for ontologies. The confusion is made worse because the description logics are often used for both. However, there is a clear difference.

  • Ontologies are about the things being represented – patients, their diseases. They are about what is always true, whether or not it is known to the clinician. For example, all patients have a body temperature (possibly ambient if they are dead); however, the body temperature may not be known or recorded. It makes no sense to talk about a patient with a "missing" body temperature.
  • Data structures are about the artefacts in which information is recorded. Not every data structure about a patient need include a field for body temperature, and even if it does, that field may be missing for any given patient. It makes perfect sense to speak about a patient record with missing data for body temperature.

A key point is that "epistemological issues" – issues of what a given physician or the healthcare system knows – should be represented in the data structures rather than the ontology. This causes serious problems for terminologies coding systems, which often include notions such as "unspecified" or even "missing". This practice is now widely deprecated but remains common.

One of the Common Terminology Services (CTS 2) submissions to the OMG is based on Semantic Web technologies such as OWL2, SKOS, and SPARQL. The UELH proposed by the PCAST should leverage the work that has been done by the biomedical ontology community.

The NIEM Approach to Metadata-Tagged Data Elements

The report goes on to say that the metadata attached to each of these data elements

"...would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information—who may access the mammograms, either identified or de-identified, and for what purposes, (iii) the provenance of the data—the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."

The report does not explain exactly how this should be done. So let's combine the wisdom of the NIEM, HL7 greenCDA, and OASIS XSPA (Cross-Enterprise Security and Privacy Authorization Profile of XACML for healthcare) to propose a solution. Let's assume that we need to add metadata about the equiment used for the lab result as well as patient consent directives to the following lab result entry which is marked up in greenCDA format:

<result>
<resultID root="107c2dc0-67a5-11db-bd13-0800200c9a66" />
<resultDateTime value="200003231430" />
<resultType codeSystem="2.16.840.1.113883.6.1" code="30313-1"
displayName="HGB" />
<resultStatus code="completed" />
<resultValue>
<physicalQuantity value="13.2" unit="g/dl" />
</resultValue>
<resultInterpretation codeSystem="2.16.840.1.113883.5.83"
code="N" />
<resultReferenceRange>M 13-18 g/dl; F 12-16
g/dl</resultReferenceRange>
</result>

In the following, an s:metadata attribute is added to the root element (s:metadata is of type IDREFS and for brevity, I am not showing the namespace declarations):

<result s:metadata="equipment consent">
<resultID root="107c2dc0-67a5-11db-bd13-0800200c9a66" />
<resultDateTime value="200003231430" />
<resultType codeSystem="2.16.840.1.113883.6.1" code="30313-1"
displayName="HGB" />
<resultStatus code="completed" />
<resultValue>
<physicalQuantity value="13.2" unit="g/dl" />
</resultValue>
<resultInterpretation codeSystem="2.16.840.1.113883.5.83"
code="N" />
<resultReferenceRange>M 13-18 g/dl; F 12-16
g/dl</resultReferenceRange>
</result>

The following is the lab test equipment metadata:

<LabTestEquipmentMetadata s:id="equipment">
<SerialNumber>93638494749</SerialNumber>
<Manufacuturer>MedLabEquipCo.</Manufacturer>
</LabTestEquipmentMetadata>

And here is the patient consent directives marked in XACML XSPA format (this snippet is taken from the NHIN Access Consent Policies Specification):

<ConsentMetadata s:id="consent">
<Policy xmlns="urn:oasis:names:tc:xacml:2.0:policy:schema:os"
PolicyId="12345678-1234-1234-1234-123456781234"
RuleCombiningAlgId="urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:first-applicable">
<Description>Sample XACML policy for NHIN</Description>
<!-- The Target element at the Policy level identifies the subject to whom the Policy applies -->
<Target>
<Resources>
<Resource>
<ResourceMatch MatchId="http://www.hhs.gov/healthit/nhin/function#instance-identifier-equal">

<AttributeValue DataType="urn:hl7-org:v3#II"
xmlns:hl7="urn:hl7-org:v3">
<hl7:PatientId root="2.16.840.1.113883.3.18.103"
extension="00375" />
</AttributeValue>
<ResourceAttributeDesignator AttributeId="http://www.hhs.gov/healthit/nhin#subject-id"
DataType="urn:hl7-org:v3#II" />
</ResourceMatch>
</Resource>
<Actions>
<!-- This policy applies to all document query and document retrieve transactions -->
<Action>
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
urn:ihe:iti:2007:CrossGatewayRetrieve</AttributeValue>
<ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:action"
DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
</ActionMatch>
</Action>
<Action>
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
urn:ihe:iti:2007:CrossGatewayQuery</AttributeValue>
<ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:action"
DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
</ActionMatch>
</Action>
</Actions>
<Rule RuleId="133" Effect="Permit">
<Description>Permit access to all documents to all
physicians and nurses</Description>
<Target>
<Subjects>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for physicians -->
<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
112247003</AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="http://www.w3.org/2001/XMLSchema#string" />
</SubjectMatch>
</Subject>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for nurses -->
<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
106292003</AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="http://www.w3.org/2001/XMLSchema#string" />
</SubjectMatch>
</Subject>
</Subjects>
<!-- since there is no Resource element, this rule applies to all resources -->
</Target>
</Rule>
<Rule RuleId="134" Effect="Permit">
<Description>Allow access Dentists and Dental Hygienists
Access from the Happy Tooth dental practice to documents
with "Normal" confidentiality for a defined time
period.</Description>
<Target>
<Subjects>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for dentists -->
<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
106289002</AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="http://www.w3.org/2001/XMLSchema#string" />
</SubjectMatch>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
http://www.happytoothdental.com</AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:subject:organization-id"
DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
</SubjectMatch>
</Subject>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<!-- coded value for dental hygienists -->
<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
26042002</AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
DataType="http://www.w3.org/2001/XMLSchema#string" />
</SubjectMatch>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
http://www.happytoothdental.com</AttributeValue>
<SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:subject:organization-id"
DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
</SubjectMatch>
</Subject>
</Subjects>
<Resources>
<Resource>
<ResourceMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
N</AttributeValue>
<ResourceAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:resource:patient:hl7:confidentiality-code"
DataType="http://www.w3.org/2001/XMLSchema#string" />
</ResourceMatch>
</Resource>
</Resources>
<Environments>
<Environment>
<EnvironmentMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:date-greather-than-or-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#date">
2009-07-01</AttributeValue>
<EnvironmentAttributeDesignator AttributeId="http://www.hhs.gov/healthit/nhin#rule-start-date"
DataType="http://www.w3.org/2001/XMLSchema#date" />
</EnvironmentMatch>
</Environment>
<Environment>
<EnvironmentMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:date-less-than-or-equal">

<AttributeValue DataType="http://www.w3.org/2001/XMLSchema#date">
2009-12-31</AttributeValue>
<EnvironmentAttributeDesignator AttributeId="http://www.hhs.gov/healthit/nhin#rule-end-date"
DataType="http://www.w3.org/2001/XMLSchema#date" />
</EnvironmentMatch>
</Environment>
</Environments>
</Target>
</Rule>
<Rule RuleId="135" Effect="Deny">
<Description>deny all access to documents. Since this
rule is last, it will be selected if no other rule
applies, under the rule combining algorithm of first
applicable.</Description>
<Target />
</Rule>
</Resources>
</Target>
</Policy>
</ConsentMetadata>

Please note the following:

  • Metadata "LabTestEquipmentMetadata" asserts the equipment used for the lab test.
  • Metadata "ConsentMetadata" asserts the patient consent directives leveraving the XSPA XACML format.
  • Metadata can be declared once and reused by multiple elements.
  • An element can refer to 0 or more metadata objects.

In NIEM, an appinfo:AppliesTo element in a metadata type declaration is used to indicate the type to which the metadata applies as in the following example (note this is not enforced by the XML schema validating parser, but can be enforced at the application level):

<xsd:complexType name="LabTestEquipmentMetadataType">
<xsd:annotation>
<xsd:appinfo>
<i:AppliesTo i:name="LabResultType" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexContent>
<xsd:extension base="s:MetadataType">
...
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>

<xsd:element name="LabTestEquipmentMetadata" type="LabTestEquipmentMetadataType" nillable="true"/>

NIEM defines a common metadata type that can be extended by any type definition that requires metadata:

<schema
targetNamespace="http://niem.gov/niem/structures/2.0"
version="alpha2"
xmlns:i="http://niem.gov/niem/appinfo/2.0"
xmlns:s="http://niem.gov/niem/structures/2.0"
xmlns="http://www.w3.org/2001/XMLSchema">


<attribute name="id" type="ID"/>
<attribute name="linkMetadata" type="IDREFS"/>
<attribute name="metadata" type="IDREFS"/>
<attribute name="ref" type="IDREF"/>
<attribute name="sequenceID" type="integer"/>

<attributeGroup name="SimpleObjectAttributeGroup">
<attribute ref="s:id"/>
<attribute ref="s:metadata"/>
<attribute ref="s:linkMetadata"/>
</attributeGroup>

<element name="Metadata" type="s:MetadataType" abstract="true"/>

<complexType name="ComplexObjectType" abstract="true">
<attribute ref="s:id"/>
<attribute ref="s:metadata"/>
<attribute ref="s:linkMetadata"/>
</complexType>

<complexType name="MetadataType" abstract="true">
<attribute ref="s:id"/>
</complexType>

</schema>

Any type definition that needs metadata can simply extend ComplexObjectType as follows for lab result type:

<xsd:complexType name="LabResultType">
<xsd:complexContent>
<xsd:extension base="s:ComplexObjectType">
<xsd:sequence>...</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>