Showing posts with label EHR. Show all posts
Showing posts with label EHR. Show all posts

Sunday, August 17, 2014

Natural Language Processing (NLP) for Clinical Decision Support: A Practical Approach

A significant portion of the electronic documentation of clinical care is captured in the form of unstructured narrative text like psychotherapy and progress notes. Despite the big push to adopt structured data entry (as required by the Meaningful Use incentive program for example), many clinicians still like to document care using free narrative text. The advantage of using narrative text as opposed to coded entries is that narrative text can tell the story of the patient and the care provided particularly in complex cases. My opinion is that free narrative text should be used to complement coded entries when necessary to capture relevant information.

Furthermore, medical knowledge is expanding very rapidly. For example, PubMed has more than 24 millions citations for biomedical literature from MEDLINE, life science journals, and online books. It is impossible for the human brain to keep up with that amount of knowledge. These unstructured sources of knowledge contain the scientific evidence that is required for effective clinical decision making in what is referred to as Evidence-Based Medicine (EBM).

In this blog, I discuss two practical applications of Natural Language Processing (NLP). The first is the use of NLP tools and techniques to automatically extract clinical concepts and other insight from clinical notes for the purpose of providing treatment recommendations in Clinical Decision Support (CDS) systems. The second is the use of text analytics techniques like clustering and summarization for Clinical Question Answering (CQA).

The emphasis of this post is on a practical approach using freely available and mature open source tools as opposed to an academic or theoretical approach. For a theoretical treatment of the subject, please refer to the book Speech and Language Processing by Daniel Jurafsky and James Martin.


Clinical NLP with Apache cTAKES


Based on the Apache Unstructured Information Management Architecture (UIMA) framework and the Apache OpenNLP natural language processing toolkit, Apache cTAKES provides a modular architecture utilizing both rule-based and machine learning techniques for information extraction from clinical notes. cTAKES can extract named entities (clinical concepts) from clinical notes in plain text or HL7 CDA format and map these entities to various dictionaries including the following Unified Medical Language System (UMLS) semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, and medications.

cTAKES includes the following key components which can be assembled to create processing pipelines:

  • Sentence boundary detector based on the OpenNLP Maximum Entropy (ME) sentence detector.
  • Tokenizor
  • Normalizer using the National Library of Medicine's Lexical Variant Generation (LVG) tool
  • Part-of-speech (POS) tagger
  • Shallow parser
  • Named Entity Recognition (NER) annotator using dictionary look-up to UMLS concepts and semantic types. The Drug NER can extract drug entities and their attributes such as dosage, strength, route, etc.
  • Assertion module which determines the subject of the statement (e.g., is the subject of the statement the patient or a parent of the patient) and whether a named entity or event is negated (e.g., does the presence of the word "depression" in the text implies that the patient has depression).
Apache cTAKES 3.2 has added YTEX, a set of extensions developed at Yale University which provide integration with MetaMap, semantic similarity, export to Machine Learning packages like Weka and R, and feature engineering.

The following diagram from the Apache cTAKES Wiki provides an overview of these components and their dependencies (click to enlarge):


Massively Parallel Clinical Text Analytics in the Cloud with GATECloud


The General Architecture for Text Engineering (GATE) is a mature, comprehensive, and open source text analytics platform. GATE is a family of tools which includes:

  • GATE Developer: an integrated development environment (IDE) for language processing components with a comprehensive set of available plugins called CREOLE (Collection of REusable Objects for Language Engineering). 
  • GATE Embedded: an object library for embedding services developed with GATE Developer into third-party applications.
  • GATE Teamware: a collaborative semantic annotation environment based on a workflow engine for creating manually annotated corpora for applying machine learning algorithms. 
  • GATE Mímir: the "Multi-paradigm Information Management Index and Repository" which supports a multi-paradigm approach to index and search over text, ontologies, and semantic metadata.
  • GATE Cloud: a massively parallel clinical text analytics platform (Platform as a Service or PaaS) built on the Amazon AWS Cloud.
What makes GATE particularly attractive is the recent addition of GATECloud.net PaaS which can boost the productivity of people involved in large scale text analytics tasks.

 

Clustering, Classification, Text Summarization, and Clinical Question Answering (CQA)

 

An unsupervised machine learning approach called Clustering can be used to classify large volumes of medical literature into groups (clusters) based on some similarity measure (such as the Euclidean distance). Clustering can be applied at the document, search result, and word/topic levels. Carrot2 and Apache Mahout are open source projects that provide several methods for document clustering. For example, the Latent Dirichlet Allocation learning algorithm in Apache Mahout automatically clusters words into topics and documents into mixtures of topics. Other clustering algorithms in Apache Mahout include: Canopy, Mean-Shift, Spectral, K-Means and Fuzzy K-Means. Apache Mahout is part of the Hadoop ecosystem and can therefore scale to very large volumes of unstructured text.

Document classification essentially consists in assigning predefined set of labels to documents. This can be achieved through supervised machine learning algorithms. Apache Mahout implements the Naive Bayes classifier.

Text summarization techniques can be used to present succinct and clinically relevant evidence to clinicians at the point of care. MEAD (http://www.summarization.com/mead/) is an open source project that implements multiple summarization algorithms. In the biomedical domain, SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text. Subject and object arguments of each predication are concepts from the UMLS Metathesaurus and the relation is from the UMLS Semantic Network (e.g., TREATS, Co-OCCURS_WITH). The SemRep summarization provides a short summary of these concepts and their semantic relations.

AskHermes (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS) is a project that attempts to implement these techniques in the clinical domain. It allows clinicians to enter questions in natural language and uses the following unstructured information sources: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines, and Wikipedia articles.

The processing pipeline in AskHermes includes the following: Question Analysis, Related Questions Extraction, Information Retrieval, Summarization and Answer Presentation. AskHermes performs question classification using MMTx (MetaMap Technology Transfer) to map keywords to UMLS concepts and semantic types. Classification is achieved through supervised machine learning algorithms such as Support Vector Machine (SVM) and conditional random fields (CFRs). Summarization and answer presentation are based on clustering techniques. AskHermes is powered by open source components including: JBoss Seam, Weka, Mallet , Carrot2 , Lucene/Solr, and WordNet (a lexical database for the English language).

Sunday, December 29, 2013

Improving the quality of mental health and substance use treatment: how can Informatics help?


According to the 2012 National Survey on Drug Use and Health, an estimated 43.7 million adults aged 18 or older in the United States had mental illness in the past year. This represents 18.6 percent of all adults in this country. Among those 43.7 million adults, 19.2 percent (8.4 million adults) met criteria for a substance use disorder (i.e., illicit drug or alcohol dependence or abuse). In 2012, an estimated 9.0 million adults (3.9 percent) aged 18 or older had serious thoughts of suicide in the past year.

Mental health and substance use are often associated with other issues such as:

  • Co-morbidity involving other chronic diseases like HIV, hepatitis, diabetes, and cardiovascular disease.

  • Overdose and emergency care utilization.

  • Social issues like incarceration, violence, homelessness, and unemployment.
It is now well established that addiction is a chronic disease of the brain and should be treated as such from a health and social policy standpoint.


The regulatory framework

  • The Affordable Care Act (ACA) requires non-grandfathered health plans in the individual and small group markets to provide essential health benefits (EHBs) including mental health and substance use disorder benefits.  

  • Starting in 2014, insurers can no longer deny coverage because of a pre-existing mental health condition.

  • The ACA requires health plans to cover recommended evidence-based prevention and screening services including depression screening for adults and adolescents and behavioral assessments for children.

  • On November 8, 2013, HHS and the Departments of Labor and Treasury released the final rules implementing the Paul Wellstone and Pete Domenici Mental Health Parity and Addiction Equity Act of 2008 (MHPAEA). 

  • Not all behavioral health specialists are eligible to the Meaningful Use EHR Incentive program created by the Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009.

 

Implementing Clinical Practice Guidelines (CPGs) with Clinical Decision Support (CDS) systems

 

Clinical Decision Support (CDS) can help address key challenges in mental health and substance use treatment such as:

  • Shortages and high turnover in the addiction treatment workforce.

  • Insufficient or lack of adequate clinician education in mental health and addiction medicine.

  • Lack of implementation of available evidence-based clinical practice guideline (CPGs) in mental health and addiction medicine.
For example, there are a number of scientifically validated CPGs for the Medication Assisted Treatment (MAT) of opioid addiction using methadone or buprenorphine. These evidence-based CPGs can be translated into executable CDS rules using business rule engines. These executable clinical rules should also be seamlessly integrated with clinical workflows.

The complexity and costs inherent in capturing the medical knowledge in clinical guidelines and translating that knowledge into executable code remains an impediment to the widespread adoption of CDS software. Therefore, there is a need for standards that facilitate the sharing and interchange of CDS knowledge artifacts and executable clinical guidelines. The ONC Health eDecision Initiative has published specifications to support the interoperability of CDS knowledge artifacts and services.

Ontologies as knowledge representation formalism are well suited for modeling complex medical knowledge and can facilitate reasoning during the automated execution of clinical guidelines based on patient data at the point of care.

The typical Clinical Practice Guideline (CPG) is 50 to 150 pages long. Clinical Decision Support (CDS) should also include other forms of cognitive aid such as Electronic Checklists, Data Visualization, Order Sets, and Infobuttons.

The issues of human factors and usability of CDS systems as well as CDS integration with clinical workflows have been the subject of many research projects in healthcare informatics. The challenge is to be bring these research findings into the practice of developing clinical systems software.


Learning from Data


Learning what works and what does not work in clinical practice is important for building a learning health system. This can be achieved by incorporating the results of Comparative Effectiveness Research (CER) and Patient-Centered Outcome Research (PCOR) into CDS systems. Increasingly, outcomes research will be performed using observational studies (based on real world clinical data) which are recognized as complementary to randomized control trials (RCTs). For example, CER and PCOR can help answer questions about the comparative effectiveness of pharmacological and  psychotherapeutic interventions in mental health and substance abuse treatment. This is a form of Practice-Based Evidence (PBE) that is necessary to close the evidence loop.

Three factors are contributing to the availability of massive amounts of clinical data: the rising adoption of EHRs by providers (thanks in part to the Meaningful Use incentive program), medical devices (including those used by patients outside of healthcare facilities), and medical knowledge (for example in the form of medical research literature). Massively parallel  computing platforms such as Apache Hadoop or Apache Spark can process humongous amounts of data (including in real time) to obtain actionable insights for effective clinical decision making.

The use of predictive modeling for personalized medicine (based on statistical computing and machine learning techniques) is becoming a common practice in healthcare delivery as well. These models can predict the health risk of patients (for pro-active care) based on their individual health profiles and can also help predict which treatments are more likely to lead to positive outcomes.

Embedding Visual Analytics capabilities into CDS systems can help clinicians obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. For example, Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes for a patient or population of patients over a certain period of time through the vivid showing of causes, variables, comparisons, and explanations.


Genomics of Addiction and Personalized Medicine


Advances in genomics and pharmacogenomics are helping researchers understand treatment response variability among patients in addiction treatment. Clinical Decision Support (CDS) systems can also be used to provide cognitive support to clinicians in providing genetically guided treatment interventions.


Quality Measurement for Mental Health and Substance Use Treatment


An important implication of the shift from a fee-for-service to a value-based healthcare delivery model is that existing process measures and the regulatory requirements to report them are no longer sufficient.

Patient-reported outcomes (PROs) and patient-centered measures include essential metrics such as mortality, functional status, time to recovery, severity of side effects, and remission (depression remission at six and twelve months). These measures should take into account the values, goals, and wishes of the patient. Therefore patient-centered outcomes should also include the patient's own evaluation of the care received.

Another issue to be addressed is the lack of data elements in Electronic Medical Record (EMR) systems for capturing, reporting, and analyzing PROs. This is the key to accountability and quality improvement in mental health and substance use treatment.


Using Natural Language Processing (NLP) for the automated processing of clinical narratives


Electronic documentation in mental health and substance use treatment is often captured in the form of narrative text such as psychotherapy notes. Natural Language Processing (NLP) and machine learning tools and techniques (such as named entity recognition) can be used to extract clinical concepts and other insight from clinical notes.

Another area of interest is Clinical Question Answering (CQA) that would allow clinicians to ask questions in natural language and extract clinical answers from very large amounts of unstructured sources of medical knowledge. PubMed has more than 23 millions citations for biomedical literature from MEDLINE, life science journals, and online books. It is impossible for the human brain to keep up with that amount of knowledge.



Computer-Based Cognitive Behavioral Therapy (CCBT) and mHealth


According to a report published last year by the California HealthCare Foundation and titled The Online Couch: Mental Health Care on the Web:

"Computer-based cognitive behavioral therapy (CCBT) cost-effectively leverages the Internet for coaching patterns in self-driven or provider-assisted programs. Technological advances have enabled computer systems designed to replicate aspects of cognitive behavior therapy for a growing range of mental health issues".
An example of a successful nationwide adoption of CCBT is the online behavioral therapy site Beating the Blues in the United Kingdom which has been proven to help patients suffering from anxiety and mild to moderate depression. Beating the Blues has been recommended for use in the NHS by the National Institute for Health and Clinical Excellence (NICE).

In addition, there is growing evidence to support the efficacy of mobile health (mHealth) technologies for supporting patent engagement and activation in health behavior change (e.g., smoking cessation).

 

Technologies in support of a Collaborative Care Model


There is sufficient evidence to support the efficacy of the collaborative care model (CCM) in the treatment of chronic mental health and substance use conditions.The CCM is based on the following principles:
  • Coordinated care involving a multi-disciplinary care team.

  • Longitudinal care plan as the backbone of care coordination.

  • Co-location of primary care and mental health and substance use specialists.

  • Case management by a Care Manager. 
Implementing an effective collaborative care model will require a new breed of advanced clinical collaboration tools and capabilities such as:
  • Conversations and knowledge sharing using tools like video conferencing for virtual two-way face-to-face communication between clinicians (see my previous post titled Health IT Innovations for Care Coordination).

  • Clinical content management and case management tools.

  • File sharing and syncing allowing the longitudinal care plan to be synchronized and shared among all members of the care team.

  • Light-weight and simple clinical data exchange standards and protocols for content, transport, security, and privacy. 

 

Patient Consent and Privacy


Because of the stigma associated with mental health and substance use, it is important to give patients control over the sharing of their medical record. Patients consent should be obtained about what type information is shared, with whom, and for what purpose. The patient should also have access to an audit trail of all data exchange-related events. Current paper-based consent processes are inefficient and lack accountability. Web-based consent management applications facilitate the capture and automated enforcement of patient consent directives (see my previous post titled Patient privacy at web scale).

Saturday, July 24, 2010

Toward An Enterprise Reference Architecture for the Meaningful Use Era

Meaningful Use is not just about financial incentives. In fact, the financial incentives provided under the HITECH Act will only cover a portion of the total costs associated with migrating the healthcare industry to health information technologies. Consider the following Meaningful Use criteria:

  • Submit electronic data to immunization registries
  • Incorporate clinical lab-test results into certified EHR technology as structured data
  • Submit electronic syndromic surveillance data to public health agencies
  • Generate and transmit permissible prescriptions electronically (eRx)
  • Provide patients with an electronic copy of their health information
  • Exchange key clinical information among providers of care and patient authorized entities electronically.

These criteria are clearly designed to bridge information silos within and across organizational boundaries in the healthcare industry. This integration is necessary to improve the coordination and quality of care, eliminate the costs associated with redundancies and other inefficiencies within the healthcare system, empower patients with their electronic health records, and enable effective public health surveillance. This integration requires a new health information network such as those provided by state and regional Health Information Exchanges (HIEs) and the Nationwide Health Information Network (NHIN) initiative. HIEs and NHIN Exchange are based on a decentralized, service-oriented architecture (SOA) whereby authorized health enterprises securely exchange health information.

Health CIOs who see Meaningful Use as part of a larger transformational effort will drive their organizations toward success. Creating a coherent and consistent Enterprise Architecture for tackling these new challenges should be a top priority. Not having a coherent Enterprise Architecture will lead to a chaotic environment with increased costs and complexity. The following are some steps that can be taken to create an Enterprise Reference Architecture that is aligned with with the organization's business context, goals, and drivers such as Meaningful Use:

  1. Adopt a proven architecture development methodology such as TOGAF.

  2. Create an inventory of existing systems such as pharmacy, laboratory, radiology, patient administration, electronic medical records (EMRs), order entry, clinical decision support, etc. This exercise is necessary to gain an understanding of current functions, redundancies, and gaps.

  3. Create a target enterprise service inventory to eliminate functional redundancies and maximize service reuse and composability. These services can be layered into utility, entity, and task service layers. For example, Meaningful Use criteria such as "Reconcile Medication Lists", "Exchange Patient Summary Record", or "Submit Electronic Data to Immunization Registries" can be decomposed into two or more fine-grained services.

  4. Task services based on the composition of other reusable services can support the orchestration of clinical workflows. The latter previously required clinicians to log into multiple systems and re-enter data, or relied on point-to-point integration via the use of interface engines. These task services can also invoke services across organizational boundaries from other providers participating in a HIE in order to compose a longitudinal electronic health record (EHR) of the patient. Other services such as clinical decision support services or terminology services can be shared by multiple healthcare organizations to reduce costs.

  5. Consider wrapper services to encapsulate existing legacy applications.

  6. The Enterprise Reference Architecture should support standards that have been mandated in the Meaningful Use Final Rule, but also those required by NHIN Exchange, NHIN Direct, and the HIE in which the organization participates. These standards are related to data exchange (HITSP C32, CCR, HL7 2.x), messaging (SOAP, WSDL, REST, SMTP, WS-I Basic), security and privacy (WS-I Security, SAML, XACML), IHE Profiles (XDS, PIX, and PDQ), and various NHIN Exchange and NHIN Direct Specifications. While standards are not always perfect, healthcare interoperability is simply not possible without them.

  7. Select a SOA infrastructure (such as an Enterprise Service Bus or ESB) that supports the standards listed above. Consider both open source and commercial offerings.

  8. Consider non-functional requirements such as performance, scalability, and availability.

  9. The Enterprise Reference Architecture should also incorporate industry best practices in the areas of SOA, Security, Privacy, Data Modeling, and Usability. These best practices are captured in various specifications such as the HL7 EHR System Functional Model, the HL7 Service Aware Interoperability Framework (SAIF), and the HL7 Decision Support Service (DSS) specification.

  10. Finally, create a Governance Framework to establish and enforce enterprise-wide technology standards, design patterns, Naming and Design Rules (NDRs), policies, service metadata and their versioning, quality assurance, and Service Level Agreements (SLAs) such as the NHIN Data Use and Reciprocal Support Agreement (DURSA).

Saturday, June 12, 2010

Putting XQuery to Work in Healthcare

The following are some of the challenges that healthcare organizations will be facing during the next few years:

  • Conversion from HIPAA 4010 to 5010
  • Conversion from ICD-9 to ICD-10
  • Efficiently storing, querying, processing, and exchanging Electronic Health Records (EHRs)
  • Mapping from HL7 2.x to HL7 v3 messages
  • Assembling EHRs by aggregating data from multiple organizations participating in Health Information Exchanges (HIEs).


XQuery is not just a query language for XML data sources. It is also a very powerful declarative, strongly typed, and side-effect free programming language for processing and manipulating XML documents. XQuery is a natural solution for querying and aggregating data coming from heterogeneous sources such as relational databases, native XML databases, file systems, and legacy data formats such as EDI. Some developers will find XQuery easier to use than XSLT because XQuery has a SQL-like syntax.


Migration to HIPAA 5010 and ICD 10

Conversion from HIPAA 4010 to 5010 and ICD-9 to ICD-10 will be a priority on the agenda in the next three years (details on final compliance dates can be found on this HHS web page).

The XQuery and XQuery Update Facility specifications provide a simple and elegant solution to this conversion challenge.


Health Information Exchanges (HIEs) and the Virtual Health Record

In a HIE with multiple participating organizations, EHR data must be assembled either through a centralized, federated, or hybrid data model. The data needed to assemble a longitudinal EHR (a virtual health record) for a patient could be coming from several providers, payers, lab companies, and medical devices. XQuery was designed to handle that type of XML processing use case.


Storing, Updating, and Querying EHRs

The HL7 CCD and ASTM CCR have been retained as Meaningful Use XML data exchange standards for EHRs. Mapping between an XML HL7 CCD representation (which is derived from the HL7 UML-based Reference Information Model or RIM) and an existing relational database structure is not trivial. IBM has been granted a patent entitled "Conversion of hierarchically-structured HL7 specifications to relational databases". The HL7 RIMBAA project provides some best practices on mapping RIM objects to a relational database structure.

With the emergence of native XML databases such as Oracle XML DB and IBM pureXML, XML is no longer just a messaging format. It can be used as a format for storing and querying data as well.

This article shows a sample code of updating an EHR stored in HL7 CDA format in an IBM DB2 pureXML native XML database.


Mapping from HL7 2.x to HL7 v3 messages

In countries like Canada where HL7 v3 has been adopted, a frequent challenge is to map from legacy HL7 2.x messages to HL7 v3 messages for example for lab results. An XQuery-based transform can be used to map from an HL7 v2.x XML structure to an HL7 v3 XML structure.


An Alternative to GELLO?

In a previous post entitled "Clinical Decision Support: Crossing the Chasm", I argued that Clinical Decision Support Systems (CDSS) implementers should be free to use any programming language of their choice. GELLO is an HL7 standard which specifies an expression and query language for CDSS. The following are the requirements for a CDSS expression and query language as specified in the GELLO specification:

  • vendor-independent
  • platform-independent
  • object-oriented and compatible with the vMR
  • easy to read/write
  • side-effect free
  • flexible
  • extensible


XQuery satisfies all these requirements except the third. XQuery is a functional programming language with no side effect as opposed to an object-oriented programming language. GELLO settled on the OMG Object Constraint Language (OCL). The following paragraph from the GELLO specification explains why XQuery (known as XQL at the time) wasn't selected:

XQL is a query language designed specifically for XML documents. XML documents are unordered, labeled trees, with nodes representing the document entity, elements, attributes, processing instructions and comments. The implied data model behind XML neither matches that of a relational data model nor that of an object-oriented data model. XQL is a query language for XML in the same sense as SQL is a query language for relational tables. Since the HL7 RIM data model and the vMR data model are both object-oriented, it is clear that XQL is not an appropriate approach for an object-oriented query and expression language.


That might have been true back in 2004 in an object-oriented world. Today, if the inputs to a CDSS are EHRs represented in HL7 CCD or ASTM CCR format, and those EHRs are stored in an XQuery compliant native XML database, then XQuery could be a strong candidate for an expression and query language for the CDSS.

Wednesday, June 9, 2010

Data Modeling for Electronic Health Records (EHR) Systems

Getting the data model right is of paramount importance for an Electronic Health Records (EHR) system. The factors that drive the data model include but are not limited to:

  • Patient safety
  • Support for clinical workflows
  • Different uses of the data such as input to clinical decision support systems
  • Reporting and analytics
  • Regulatory requirements such as Meaningful Use criteria.

Model First

Proven methodologies like contract-first web service design and model driven development (MDD) put the emphasis on deriving application code from the data model and not the other way around. Thousands of line of code can be auto-generated from the model, so it's important to get the model right.


Requirements Gathering

The objective here is to determine the entities, their attributes, and the relationships between those entities. For example, what are the attributes that are necessary to describe a patient's condition and how do you express the fact that a condition is a manifestation of an allergy? The data modeler should work closely with clinicians to gather those requirements. Industry standards should be leveraged as well. For example, HITSP C32 defines the data elements for each EHR data module such as conditions, medications, allergies, and lab results. These data elements are then mapped to the HL7 Continuity of Care Document (CCD) XML schema.

The HL7 CCD is itself derived from the HL7 Reference Information Model (RIM). The latter is expressed as a set of UML class diagrams and is the foundation model for health care and clinical data. A simpler alternative to the CCD is the ASTM Continuity of Care Records (CCR). Both the CCD and CCR provide an XML schema for data exchange and are Meaningful Use criteria. Another relevant data model is the HL7 vMR (Virtual Medical Record) which aims to define a data model for the input and output of Clinical Decision Support Systems (CDSS).

These standards can be cumbersome to use as such from a software development perspective. Nonetheless, they can inform the design of the data model for an EHR system. Alignment with the CCD and CCR will facilitate data exchange with other providers and organizations. The following are Meaningful Use criteria for data exchange:

  1. Electronically receive a patient summary record, from other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures and upon receipt of a patient summary record formatted in an alternative standard specified in Table 2A row 1, displaying it in human readable format.

  2. Enable a user to electronically transmit a patient summary record to other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures in accordance with the standards specified in Table 2A row 1.



Applying Data Modeling Patterns

Applying data modeling patterns allows model consistency and quality. Relational data modeling is a well established discipline. My favorite resource for relational data modeling patterns is: The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling.

Some XML Schema best practices can be found here.


Data Stores

Today, options for data store are no longer limited to relational databases. Alternatives include: native XML databases (e.g. DB2 pureXML), Entity-Attribute-Value with Classes and Relationships (EAV/CR), and Resource Description Framework (RDF) stores.

Native XML databases are more resilient to schema changes and do not require handling the impedance mismatch between XML documents, Java objects, and relational tables which can introduce design complexity, performance, and maintainability issues.

Storing EHRs in an RDF store can enable the inference of medical facts based on existing explicit medical facts. Such inferences can be driven by an ontology expressed in OWL or a set of rules expressed in a rule language such SWRL. Semantic Web technologies can also be helpful in checking the consistency of a model, data and knowledge integration across domains (e.g. the genomics and clinical domains), and for managing classification schemes like medical terminologies. RDF, OWL, and SWRL have been successfully implemented in Clinical Decision Support Systems (CDSS).

The data modeling notation used should be independent of the storage model or at least compatible with the latter. For example, if native XML storage is used, then a relational modeling notation might not be appropriate. In general, UML provides the right level of abstraction for implementation-agnostic modeling.


Due Diligence

When adopting a "noSQL" storage model, it is important to ensure that (a) the database can meet performance and scalability criteria and (b) the team has the skills to develop and maintain the database. Due diligence should be performed through benchmarking using a tool such as the IBM Transaction Processing over XML (TpoX). The team might need formal training in a new query language like XQuery or SPARQL.


A Longitudinal View of the Patient Health

Maintaining an up-to-date and truly longitudinal view of a patient's medical history requires merging and reconciling data from heterogeneous sources including providers' EMR systems, lab companies, medical devices, and payers' claim transaction repositories. The data model should facilitate the assembly of data from such diverse sources. XML tools based on XSLT, XQuery, or XQuery Update can be used to automate the merging.


The Importance of Data Validation

Data validation can be performed at the database layer, the application layer, and the UI layer. The data model should support the validation of the data. The following are examples of techniques that can be used for data validation:

  • XML Schema for structural validation of XML documents
  • ISO Schematron (based on XPath 2.0 and XSLT 2.0) for business rules validation of XML documents
  • A business rules engine like Drools
  • A data processing framework like Smooks
  • The validation features of a UI framework such as JSF2
  • The built-in validation features of the database.


The Future: Modeling with the NIEM IEPD


The HHS ONC issued an RFP for using the National Information Exchange Model (NIEM) Information Exchange Package Documentation (IEPD) process for healthcare data exchange. The ONC will release a NIEM Concept of Operations (ConOps). The NIEM IEPD process is explained here.