Sunday, December 30, 2012

Prediction for 2013: Intelligent Health IT Systems (iHIT) Go Mainstream

iHIT systems represent an evolution of clinical decision support (CDS) systems. Traditionally, CDS systems have provided functionalities such as Alerts and Reminders, Order Sets, Infobuttons, and Documentation Templates. iHIT systems go beyond these basic functionalities and are poised to go mainstream in 2013. This evolution is enabled by recent developments in both computing and healthcare. Notably in computing:

  • The emergence of Big Data and massively parallel computing platforms like Hadoop.
  • The entrance of the following disciplines into the mainstream of computing: Machine Learning (a branch of Artificial Intelligence), Statistical Computing, Visual Analytics, Natural Language Processing, Information Retrieval, Rule engines, and Semantic Web Technologies (like RDF, OWL, SPARQL, and SWRL). These disciplines have been around for many years, but have been largely confined into Academia, very large organizations, and niche markets.
  • The availability of open source tools, platforms, and resources to support the technologies mentioned above. Examples include: R (a statistical engine), Apache Hadoop, Apache Mahout, Apache Jena, Apache Stanbol, Apache OpenNLP, and Apache UIMA. The number of books, courses, and conferences dedicated to these topics has increased dramatically over the last two years signalling an entrance into the mainstream.
In addition, the healthcare industry itself is currently going through a significant transformation from a business model based on the number of patients treated to a value-based payment model. The Accountable Care Organization (ACO) is an example of this new model. This model puts an increased emphasis on meeting certain quality and performance metrics driven by the latest scientific evidence (this is called Evidence Based Practice or EBP).

Although very costly, Randomized Control Trials (RCTs) are considered the strongest form of evidence in EBP. Despite their inherent methodological challenges (lack of randomization leading to possible bias and confounding), observational studies (using real world data) are increasingly recognized as complementary to RCTs and an important tool in clinical decision making and health policy. According to a report titled "Clinical Practice Guidelines (CPGs) We Can Trust"  published by the Institute Of Medicine (IoM):
"Randomized trials commonly have an under representation of important subgroups, including those with comorbidities, older persons, racial and ethnic minorities, and low-income, less educated, or low-literacy patients."
Investments into Comparative Effectiveness Research (CER) are increasing as well. CER, an emerging trend in Evidence Based Practice (EBP), has been defined by the Federal Coordinating Council for CER as "the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat and monitor health conditions in 'real world' settings." CER is important not only for discovering what works and what doesn't work in practice, but also for an informed shared decision making process between the patient and her provider.

The use of predictive risk models for personalized medicine is becoming a common practice. These models can predict the health risks of patients based on their individual health profiles (including genetic profiles). These models often take the form of logistic regression models. Examples include models for predicting cardiovascular disease, ICU mortality, and hospital readmission (an important ACO performance measure).

Thanks to the Meaningful Use incentive program, adoption of electronic health record (EHR) systems by providers is rapidly increasing. This translates into the availability of huge amount of EHR data which can be harvested to provide Practice Based Evidence (PBE) necessary to close the evidence loop. PBE is the key to a learning health system. The Institute of Medicine (IOM) released a report last year titled "Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care". The report describes a learning health system as:
" of best practice guidance at the point of choice, continuous learning and feedback in both health and health care, and seamless, ongoing communication among participants, all facilitated through the application of IT."
Both EBP and PBE will require not only rigorous scientific methodologies, but also a computing platform suitable for the era of Big Data in medicine. As Williams Osler (1849-1919) famously said:
"Medicine is a science of uncertainty and an art of probability."
Lastly, to be successful, the emergence of iHIT systems will require a human-centered design approach. This will be facilitated by the use of techniques that can enhance human cognitive abilities. Examples are: Electronic Checklists (an approach that originates from the aviation industry and has been proven to save lives in healthcare delivery as well) and Visual Analytics.

Happy New Year to You and Your Family!

Saturday, December 8, 2012

A Journey into Software Excellence

I am back in the blogosphere after a seven month hiatus. It was about time I get my blogging act together. Software development has never been so much fun. In this post, I will share some thoughts on using tools, methods, and practices that can really help in your search for software excellence from the initial prototyping of the user interface to deployment.

  1. With the rapid proliferation of mobile and desktop devices, adopt a Responsive Wed Design (RWD) strategy to reach the largest audience possible.
  2. Create responsive sketches, wireframes, or mockups and apply usability guidelines during the initial prototyping. The NHS Common User Interface (CUI) Program is a good example of usability guidelines for healthcare IT applications. also has many interesting resources as well.
  3. Perform usability testing to test your design ideas and obtain early feedback from future users of your product before actual development starts. Use metrics such as the System Usability Scale (SUS) to assess the results.
  4. Carefuly select the right HTML5, CSS3, and Javascript libraries and frameworks. The Single Page Application (SPA) architecture is becoming popular and can provide a more fluid user experience.
  5. Consider "Specification By Example" and Behaviour Driven Development (BDD) tools like Cucumber-JVM to create executable user stories.
  6. Pattern languages like Domain Driven Design (DDD) can help you avoid a "Big Ball of Mud" in architecting your software. DDD concepts such as "Strategic Design", "Bounded Context", "Published Language", and "Anti-Corruption Layer" can help you put your architecture in the right perspective, particularly if there is a need to support industry interoperability standards such as HL7 and IHE. However, beware that the practice of DDD has evolved over the last 8 years and new lessons have been learned particularly in the area of "Aggregate" design. So keep up-to-date with new developments in the field in order to leverage the experience of the community. I also found the concept of "Hexagonal Architecture" very helpful in visualizing the complexity of an architecture from different angles.
  7. Consider a peer review of the architecture using a methodology like the Architecture Tradeoff Analysis Method (ATAM).
  8. Embrace Polyglot Persistence (the use of different persistence mechanisms such as relational, document, and graph databases within the same application). However, use the right application development framework to make this easy. Beware of the peculiarities of modeling data for NoSQL databases and remember that "Persistence Ignorance" is not always easy to achieve in practice.
  9. Add a social dimension to your product by integrating the user experience with existing social networking sites that your users already belong to.
  10. Make your application more intelligent through the use of techniques such as Machine Learning (e.g., a recommendation engine), ontologies and rule engines (e.g., automated reasoning), and Natural Language Processing (NLP) (e.g., automated question answering). As Richard Hamming said: "The purpose of computing is insight, not numbers".
  11. To enhance the user experience, adopt HTML5, SVG, and Javascript-based graphing and data visualization techniques for data-intensive applications.
  12. Consider the benefits of deploying the application to the cloud and if you decide to deploy to the cloud, factor that into your entire design and development process including the selection of development tools. Choosing the right Platform-as-a-Service (PaaS) provider can facilitate the process.
  13. Create a Continuous Delivery pipeline based on the core concept of automated testing. Leverage tools like Git (Distributed Version Control), Gradle (build), Jenkins (Continuous Integration), and Artifactory. Continuous Delivery allows you to go to market faster and with confidence in the quality of your product. Save infrastructure costs by using these tools in the cloud during development.
  14. Although there is still a place for manual testing, all tests should be automated as much as possible. In addition to the traditional unit tests (using tools like JUnit, TestNG, and Mockito), embrace automated cross-device, cross-browser, and cross-platform user interface (UI) testing using a tool like Selenium.
  15. Web services and performance testing should also become part of your build and Continuous Delivery pipeline using tools like soapUI and JMeter respectively. Performance testing should not be an afterthought.
  16. Adopt automated code quality inspection with tools like Sonar, Checkstyle, FindBugs, and PMD. This can supplement your peer code review process and can provide you with concrete code quality metrics in addition to automatically flagging bugs (including insecure code) in your code base.
  17. Write secure code by carefully studying the OWASP Top Ten. Adopt OWASP guidelines related to security testing and secure code reviews. Perform penetration testing to find vulnerabilities in your application before it is too late.
  18. Do your due diligence in protecting the privacy of your users data. Put the users in control of their privacy in your system by adopting standards such as OAuth2, OpenID Connect, and the User Managed Access (UMA) protocol of the Kantara Initiative. Consider increasing the strength of authentication using multi-factor authentication (e.g., two-factor authentication using the user's phone).
  19. Invest in learning and training your development team. Software excellence can only be achieved by skilled professionals.
  20. Relax, have fun, and remember that software excellence is a journey.

Saturday, May 5, 2012

How to Add Arbitrary Metadata to Any Element of an HL7 CDA Document

There has been a lot of buzz lately about metadata tagging in the health IT community. In this blog, I describe an approach to annotating HL7 CDA documents (or any other XML documents) without actually editing the document that is being annotated. Metadata tagging is just an example of annotation. The underlying principle of this approach is that Anyone can say Anything about Anything (the AAA slogan) which is well know in the Semantic Web community. In other words, anyone (e.g., patient, care giver, physician, provider organization) should have the ability to add arbitrary metadata to any element of a CDA document. For the sake of "Separation of Concerns" which is a fundamental principle in software engineering, the metadata should be kept out of the CDA document. The benefits of keeping the metadata or annotations out of the CDA document include:
  • Reuse of the same metadata by distinct elements from potentially multiple clinical documents.
  • The ability to update the metadata without affecting the target CDA documents.
  • The ability for any individual, organization, or community of interest (e.g., privacy or medical device manufacturers) to create a metadata vocabulary without going through the process of modifying the normative CDA specification (or one of its offsprings like the CCD, the C32, or the Consolidated CDA) or the XDS metadata specifications.

History and Current Status of Metadata Standards in Health IT

The CDA specification defines some metadata in the header of a CDA document. In addition, the XD* family of specifications (XDS, XDR, and XDM) also defines a comprehensive set of metadata to be used in cross enterprise document exchange. NIEM is being used currently in several health IT projects. In a previous post titled "Toward a Universal Exchange Language for Healthcare", I described how the NIEM metadata approach could be adapted to the healthcare domain.

The President's Council of Advisors on Science and Technology (PCAST) published a report in December 2010 entitled: "Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward". To describe the proposed approach to metadata tagging, the report provides an example based on the exchange of mammograms:
"The physician would be able to securely search for, retrieve, and display these privacy protected data elements in much the way that web surfers retrieve results from a search engine when they type in a simple query.
What enables this result is the metadata attached to each of these data elements (mammograms), which would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information-who may access the mammograms, either identified or de­identified, and for what purposes, (iii) the provenance of the data-the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."
The HIT Standards Committee (HITSC) Metadata Tiger Team made specific recommendations to the ONC in June 2011. These recommendations included the use of:

  • Policy Pointers: URLs that point to external policy documents affecting the tagged data element.
  • Content Metadata: the actual metadata with datatype (clinical category) and sensitivity (e.g., substance abuse and mental health).
  • Use of the HL7 CDA R2 with headers.

Based on those recommendations, the ONC published a Notice of Proposed Rule Making (NPRM) in August 2011 to receive comments on proposed metadata standards.

The Data Segmentation Working Group of the ONC Standards and Interoperability Framework is currently working on metadata tagging for compliance with privacy policies and consent directives.

The Annotea Protocol

The capability to add arbitrary metadata to documents without modifying them has been available in the Semantic Web for at least a decade. Indeed, it is hard to talk about metadata without a reference to the Semantic Web. I will use the W3C Annotea Protocol (which is implemented by the Amaya open source project) to demonstrate this capability. I will also show that this approach does not require the use of the Resource Description Framework (RDF) format and related Semantic Web technologies like OWL and SPARQL. The approach can be adapted to alternative representation formats such as XML, JSON, or the Atom syndication format. Let's assume that I need to add metadata tags to the CDA document below. The CDA document has only one problem entry for substance abuse disorder (SNOMED CT code 66214007) and my goal is to attach privacy metatada prohibiting the disclosure of that information (the most relevant elements are highlighted in red):

<templateId root="2.16.840.1.113883."
<templateId root=""
    assigningAuthorityName="IHE PCC"/>
<templateId root="2.16.840.1.113883." assigningAuthorityName="HL7 CCD"/>
<!--Problems section template-->
<code code="11450-4" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC"
    displayName="Problem list"/>
<entry typeCode="DRIV">
<act classCode="ACT" moodCode="EVN">
    <templateId root="2.16.840.1.113883."
        assigningAuthorityName="HITSP C83"/>
    <templateId root="2.16.840.1.113883."
    <templateId root=""
        assigningAuthorityName="IHE PCC"/>
    <templateId root=""
        assigningAuthorityName="IHE PCC"/>
    <!-- Problem act template -->
    <id root="6a2fa88d-4174-4909-aece-db44b60a3abb"/>
    <code nullFlavor="NA"/>
    <statusCode code="completed"/>
        <low value="1950"/>
        <high nullFlavor="UNK"/>
    <performer typeCode="PRF">
            <id extension="PseudoMD-2" root="2.16.840.1.113883."/>
    <entryRelationship typeCode="SUBJ" inversionInd="false">
        <observation classCode="OBS" moodCode="EVN">
            <templateId root="2.16.840.1.113883."
            <templateId root=""
                assigningAuthorityName="IHE PCC"/>
            <!--Problem observation template - NOT episode template-->
            <id root="d11275e7-67ae-11db-bd13-0800200c9a66"/>
            <code code="64572001" displayName="Condition"
                <reference value="#PROBSUMMARY_1"/>
            <statusCode code="completed"/>
                <low value="1950"/>
            <value  displayName="Substance Abuse Disorder" code="66214007" codeSystemName="SNOMED" codeSystem="2.16.840.1.113883.6.96"/>
            <entryRelationship typeCode="REFR">
                <observation classCode="OBS" moodCode="EVN">
                    <templateId root="2.16.840.1.113883."/>
                    <!-- Problem status observation template -->
                    <code code="33999-4" codeSystem="2.16.840.1.113883.6.1"
                    <statusCode code="completed"/>
                    <value  code="55561003"
                        <reference value="#PROBSTATUS_1"/>

The following is a separate annotation document containing some metadata pointing to the Substance Abuse Disorder entry in the target CDA document:

<r:RDF xmlns:r=""
        <r:type r:resource=""/>
        <r:type r:resource=""/>
        <a:annotates r:resource=""/>
        <d:title>Sample Metadata Tagging</d:title>
        <d:creator>Bob Smith</d:creator>
        <a:body>Do Not Disclose</a:body>

Please note a few interesting facts about the annotation document:

  • As explained by the original specification: "The Annotea protocol works without modifying the original document; that is, there is no requirement that the user have write access to the Web page being annotated."
  • The annotation itself has metadata using the well known Dublin Core metadata specification to specify who created this annotation and when.
  • The document being annotated is cda.xml located at This is described by the element <a:annotates r:resource=""/>.
  • The specific element that is being annotated within the target CDA document is specified by the context element: <a:context>[1]/section[1]/entry[1])</a:context> using XPointer, a specification described by the W3C as "the language to be used as the basis for a fragment identifier for any URI reference that locates a resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity."
  • The XPath expression /ClinicalDocument/component/structuredBody/component[1]/section[1]/entry[1] within the XPointer is used to target the entry element in the CDA document.
  • Using XPath (1.0 or 2.0) allows us to address any element (or node) in an XML document. For example, this XPath //value[@code='66214007']/ancestor::entry will point to any entry element which contains a value element with an attribute code='66214007' (essentially targeting all entry elements which contain a Substance Abuse Observation). The combination of XPath, XPointer, and standard medical terminology codes gives the ability to attach any annotation or metadata to any element having interoperable semantics.
  • The body element contains the actual annotation: <a:body>Do Not Disclose</a:body>. However, the body of the annotation can also be located outside of the annotation (e.g., in a shared metadata registry) in which case the body element will be marked up as in the following example: <a:body r:resource=""/>

Alternative Representations


As mentioned before, for those who for one reason or another don't want to use RDF and related Semantic Web technologies, the annotation can be easily converted to a pure XML (as opposed to RDF/XML), JSON, or Atom representation. The original Annotea Protocol describes a RESTful protocol which includes the following operations: posting, querying, downloading, updating, and deleting annotations. The Atom Publishing Protocol (APP) is a newer RESTful protocol that is well adapted to the Atom syndication format.

Processing Annotations with XPointer

How the annotations are processed and consumed is only limited by the requirements of a specific application and the imagination of the developers writing it. For example, an application can read both the annotation document and the target CDA document and overlay the annotations on top of the entries in the CDA document while displaying the latter in a web browser. Another example is the enforcement of privacy policies and preferences prior to exchanging the CDA document. The issue that will be raised is how to process the XPointer fragment identifiers. XPointer uses XPath which is a well established XML addressing mechanism supported by many XML processing APIs across programming languages. For those of you who use XSLT2 to process CDA documents, there is the open source XPointer Framework for XSLT2 for use with the Saxon XSLT2 engine.

Monday, February 6, 2012

Toward Intelligent Health IT (iHIT) Systems: Getting Out of the Box

In this post, I describe a new type of application that I refer to as iHIT. iHIT stands for Intelligent Health IT.

The Architecture of Traditional Health IT systems

Traditional software architectures for health IT systems typically include the following:

  • Dependency Injection (DI)

  • Object Relational Mapping (ORM)

  • An architectural pattern for the presentation layer such as the Model View Controller (MVC) pattern

  • HTML5, CSS3, and a JavaScript library like JQuery/Mobile

  • Other architectural patterns including GoF Design Patterns, SOLID Principles, and Domain Driven Design (DDD)

  • Structured Query Language (SQL)

  • Enterprise Integration Patterns (EIPs) implemented through an Enterprise Service Bus (ESB) using HL7 messages as the "Published Language"

  • REST or SOAP-based web services.

An entire generation of developers has been trained in these techniques. They represent proven best practices accumulated over several decades of object-oriented design and relational data management. Although pervasive in today's clinical systems, these applications lack basic intelligent features such as the ability to capture and execute expert knowledge, make inferences, or make predictions about the future based on the analysis of historical data. Some of these systems actually look like glorified data entry systems.

With the availability and explosion of medical knowledge and real world observational EHR data, these intelligent features will become important in assisting clinicians in the medical decision making process at the point of care by reducing their cognitive load.

Intelligent Health IT (iHIT) Systems

iHIT systems process huge quantities of both structured and unstructured data to provide clinicians with specific recommendations. iHIT systems play an important role in translating Comparative Effectiveness Research (CER) findings into clinical practice. Comparative effectiveness Research (CER), an emerging trend in Evidence Based Medicine (EBM), has been defined by the Federal Coordinating Council for CER as "the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat and monitor health conditions in 'real world' settings." For example, based on the clinical profile of a patient, CER can help determine the best treatment option for breast cancer among the various options available such as: chemotherapy, radiation therapy, and surgery (Masectomy and Lumpectomy).

The following are examples of key characteristics displayed by iHIT systems:

  • The ability to analyze patient data as well as very large historical observational data sets in order to make probability-based predictions about the future and recommend specific actions that can yield the best clinical outcomes given the clinical profile of a patient.

  • The ability to capture and execute expert knowledge such as the medical knowledge contained in Clinical Practice Guidelines (CPGs). This includes the ability to mediate between different CPGs to arrive at a specific recommendation by merging and reconciling the medical knowledge in multiple CPGs as is the case with patients with comorbidities.

  • The ability to perform automated reasoning by inferring new implicit clinical facts from existing explicit facts and by exploiting semantic relationships between concepts and entities.

  • The ability to retrieve knowledge from unstructured data sources such as the biomedical research literature from sources like PubMed in order to answer clinical questions sometimes posed in natural language.

  • The ability to learn over time (and hence become smarter) as the amount of processed data continues its exponential growth.

  • Very fast response time to queries over very large data sets.

Sounds like Artificial Intelligence (AI)? I believe we are indeed witnessing the resurgence of AI and even the ideas of the Semantic Web in the healthcare industry. As healthcare costs and quality become national priorities for many countries around the world, the boundaries of computing will continue to be pushed further. Actually, some of the underlying principles of intelligent systems were originally developed decades and even centuries ago in the field of biomedical research. Williams Osler (1849-1919) famously said:

Medicine is a science of uncertainty and an art of probability.

Technologically advanced and competitive industries like financial services (e.g., credit eligibility and fraud detection), online retail (e.g., recommendation engine), and logistics (e.g., delivery route optimization) have adopted some of these technologies. Health IT developers now need to embrace them as well. This will require thinking out of the box.

The Ingredients of iHIT Systems

iHIT systems represent not one, but the integration of many different technologies. Mathematical Models, Statistical Analysis, and Machine Learning algorithms play an important role in iHIT systems. Examples include:

  • Logistic Regression models

  • Decision Trees

  • Association Rules

  • Bayesian Network

  • Neural Networks

  • Random Forests

  • Time Series for temporal reasoning

  • k-means Clustering

  • Support Vector Machines (SVM)

  • Probabilistic Graphical Models (PGMs) based on methods such as Bayesian networks and Markov Networks for making clinical decisions under uncertainty.

These algorithms can be used not only for making therapeutic predictions (e.g., the future hospitalization risk of a patient with Asthma), but also for dividing a population into subgroups based on the clinical profile of patients in order to achieve the best treatment outcomes.

Clinical Practice Guidelines (CPGs) are usually-based on Systematic Reviews (SRs) of Randomized Controlled Trials (RCTs) which are essentially scientific experiments. According to a report titled "Clinical Practice Guidelines (CPGs) We Can Trust" which was published last year by the Institute Of Medicine (IoM):

However, even when studies are considered to have high internal validity, they may not be generalizable to or valid for the patient population of guideline relevance. Randomized trials commonly have an under representation of important subgroups, including those with comorbidities, older persons, racial and ethnic minorities, and low-income, less educated, or low-literacy patients. Many RCTs and observational studies fail to include such "typical patients" in their samples; even when they do, there may not be sufficient numbers of such patients to assess them separately or the subgroups may not be properly analyzed for differences in outcomes.

On the other hand, observational studies using statistical analysis and machine learning algorithms operate on large real world observational data and can therefore provide feedback on the effectiveness of the actual use of different therapeutic interventions. Although very costly, RCTs are still considered the strongest form of evidence in EBM. Despite their inherent methodological challenges (lack of randomization leading to possible bias and confounding), observational studies are increasingly recognized as complementary to RCTs and an important tool in clinical decision making and health policy. iHIT systems play an important role in translating Comparative Effectiveness Research (CER) findings into clinical practice in the form of clinical decision support (CDS) interventions at the point of care.

iHIT systems also use business rules engines to capture and execute expert knowledge such as the medical knowledge contained in Clinical Practice Guidelines (CPGs). Examples include rules engines based on forward chaining inference, also known as production rule systems. These rules engines can be combined with Complex Event Processing (CEP) and Business Process Management (BPM) for intelligent decision making.

iHIT systems support ontologies such as those represented by the web ontology language (OWL) providing reasoning capabilities as well as the ability to navigate semantic relationships between concepts and entities.

More advanced iHIT systems have Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) capabilities in order to answer clinical questions posed in natural language. They rely on Information Retrieval techniques like probabilistic methods for scoring the relevance of a document given a query and the application of supervised machine learning classification methods such as decision trees, Naive Bayes, K-Nearest Neighbors (kNN), and Support Vector Machines (SVM).

In some cases, the responsibilities of an iHIT system are performed by Intelligent Agents which are autonomous entities capable of observing the clinical environment and acting upon those observations.

For scalability and performance, iHIT systems often sit on NoSQL databases and run on massively parallel computing platforms like Apache Hadoop while leveraging the elasticity of the cloud.

Integrating these technologies is the main challenge posed by iHIT systems. An example is the integration between statistical and machine learning models, business rules, ontologies, and more traditional forms of computing such as object-oriented programming. Various solutions to these challenges have been proposed and implemented.

Human-Centered Design

Finally, iHIT systems fully embrace a human-centered design approach. They provide a seamless integration between automated decision logic and clinical workflows. They provide the clinician with detailed explanations of the rationale behind the actions they recommend. In addition, they use techniques like Visual Analytics to enhance human cognitive abilities in order to facilitate analytical reasoning over very large data sets.