Adventures in Computing: Healthcare

Showing posts with label Healthcare. Show all posts

Monday, September 15, 2014

Single Sign-On (SSO) for Cloud-based SaaS Applications

Single Sign-On (SSO) is a key capability for Software as a Service (SaaS) applications particularly when there is a need to integrate with existing enterprise applications. In the enterprise world dominated by SOAP-based web services, security has been traditionally achieved with standards like WS-Security, WS-SecurityPolicy, WS-SecureConversation, WS-Trust, XML Encryption, XML Signatures, the WS-Security SAML Token Profile, and XACML.

During the last few years, the popularity of Web APIs, mobile technology, and Cloud-based software services has led to the emergence of light-weight security standards in support of the new REST/JSON paradigm with specifications like OAuth2 and OpenID Connect.

In this post, I discuss the state of the art in standards for SSO.

SAML2 Web SSO Profile

SAML2 Web SSO Profile (not to be confused with the WS-Security SAML Token Profile mentioned earlier) is not a new standard. It was approved as an OASIS standard in 2005. SAML2 Web SSO Profile is still today a force to reckon with when it comes to enabling SSO within the enterprise. In a post titled SAML vs OAuth: Which One Should I Use?, Anil Saldhana, former Lead Identity Management Architect at Red Hat offered the following suggestions:

If your usecase involves SSO (when at least one actor or participant is an enterprise), then use SAML.

If your usecase involves providing access (temporarily or permanent) to resources (such as accounts, pictures, files etc), then use OAuth.

If you need to provide access to a partner or customer application to your portal, then use SAML.

If your usecase requires a centralized identity source, then use SAML (Identity provider).

If your usecase involves mobile devices, then OAuth2 with some form of Bearer Tokens is appropriate.

Salesforce.com who is arguably the leader in cloud-based SaaS services supports SAML2 Web SSO Profile as one of its main SSO mechanisms (see the Salesforce Single Sign-On Implementation Guide). The Google Apps platform supports SAML2 Web SSO Profile as well.

Federal Identity, Credential, and Access Management (FICAM), a US Federal Government initiative has selected SAML2 Web SSO Profile for the purpose of Level of Assurance (LOA) 1 to 4 as defined by the NIST Special Publication 800-62-2 (see ICAM SAML 2.0 Web Browser SSO Profile). This is significant given the challenges associated with identity federation at the scale of a large organization like the US federal government.

SAML bindings specify underlying transport protocols including:

HTTP Redirect Binding
HTTP POST Binding
HTTP Artifact Binding
SAML SOAP Binding.

SAML profiles define how the SAML assertions, protocols, and bindings are combined to support particular usage scenarios. The Web Browser SSO Profile and the Single Logout Profile are the most commonly used profiles.

Identity Provider (idP) initiated SSO with POST binding is one the most popular implementations (see diagram below from the OASIS SAML Technical Overview for a typical authentication flow).

The SAML2 Web SSO ecosystem is very mature, cross-platform, and scalable. There are a number of open source implementations available as well. However, things are constantly changing in technology and identity federation is no exception. At the Cloud Identity Summit in 2012, Craig Burton, a well known analyst in the identity space declared:

SAML is the Windows XP of Identity. No funding. No innovation. People still use it. But it has no future. There is no future for SAML. No one is putting money into SAML development. No one is writing new SAML code. SAML is dead.

Craig Burton further clarified his remarks by saying:

SAML is dead does not mean SAML is bad. SAML is dead does not mean SAML isn’t useful. SAML is dead means SAML is not the future.

At the time, this provoked a storm in the Twitterverse because of the significant investments that have been made by enterprise customers to implement SAML2 for SSO.

WS-Federation

There is an alternative to SAML2 Web SSO Profile called WS-Federation which is supported in Microsoft products like Active Directory Federation Services (ADFS), Windows Identity Foundation (WIF), and Azure Active Directory. Microsoft has been a strong promoter of WS-Federation and has implemented WS-Federation in several products. There is also a popular open source identity server on the .NET platform called Thinktecture IdentityServer v2 which also supports WS-Federation.

For enterprise SSO scenarios between business partners exclusively using Microsoft products and development environment, WS-Federation could be a serious contender. However, SAML2 is more widely supported and implemented outside of the Microsoft world. For example, Salesforce.com and Google Apps do not support WS-Federation for SSO. Note that Microsoft ADFS implements the SAML2 Web SSO Profile in addition to WS-Federation.

OpenID Connect

OpenID Connect is a simple identity layer on top of OAuth2. It has been ratified by the OpenID Foundation in February 2014 but has been in development for several years. Nat Sakimura's Dummy’s guide for the Difference between OAuth Authentication and OpenID is a good resource for understanding the difference between OpenID, OAuth2, and OpenID Connect. In particular, it explains why OAuth2 alone is not strictly an authentication standard. The following diagram from the OpenID Connect specification represents the components of the OpenID Connect stack (click to enlarge).

Also note that OAuth2 tokens can be JSON Web Token (JWT) or SAML assertions.

The following is the basic flow as defined in the OpenID Connect specification:

The RP (Client) sends a request to the OpenID Provider (OP).

The OP authenticates the End-User and obtains authorization.

The OP responds with an ID Token and usually an Access Token.

The RP can send a request with the Access Token to the UserInfo Endpoint.

The UserInfo Endpoint returns Claims about the End-User.

There are two subsets of the Core functionality with corresponding implementer’s guides:

Basic Client Implementer’s Guide –for a web-based Relying Party (RP) using the OAuth code flow
Implicit Client Implementer’s Guide – for a web-based Relying Party using the OAuth implicit flow

OpenID Connect is particularly well-suited for modern applications which offer RESTful Web APIs, support JSON payloads, run on mobile devices, and are deployed to the Cloud. Despite being a relatively new standard, OpenID Connect also boasts an impressive list of implementations across platforms. It is already supported by big players like Google, Microsoft, PayPal, and Salesforce. In particular, Google is consolidating all federated sign-in support onto the OpenID Connect standard. Open Source OpenID Connect Identity Providers include the Java-based OpenAM and the .Net-based Thinktecture Identity Server v3.

From WS* to JW* and JOSE

As can be seen from the diagram above, a complete identity federation ecosystem based on OpenID Connect will also require standards for representing security assertions, digital signatures, encryption, and cryptographic keys. These standards include:

JSON Web Token (JWT)
JSON Web Signature (JWS)
JSON Web Encryption (JWE)
JSON Web Key (JWK)
JSON Web Algorithms (JWA).

There is a new acronym for these emerging JSON-based identity and security protocols: JOSE which stands for Javascript Object Signing and Encryption. It is also the name of the IETF Working Group developing JWS, JWE, and JWK. A Java-based open source implementation called jose4j is available.

Access Control with the User Managed Access (UMA)

According to the UMA Core specification,

User-Managed Access (UMA) is a profile of OAuth 2.0. UMA defines how resource owners can control protected-resource access by clients operated by arbitrary requesting parties, where the resources reside on any number of resource servers, and where a centralized authorization server governs access based on resource owner policy.

In the UMA protocol, OpenID Connect provides federated SSO and is also used to convey user claims to the authorization server. In a previous post titled Patient Privacy at Web Scale, I discussed the application of UMA to the challenges of patient privacy.

Sunday, August 17, 2014

Natural Language Processing (NLP) for Clinical Decision Support: A Practical Approach

A significant portion of the electronic documentation of clinical care is captured in the form of unstructured narrative text like psychotherapy and progress notes. Despite the big push to adopt structured data entry (as required by the Meaningful Use incentive program for example), many clinicians still like to document care using free narrative text. The advantage of using narrative text as opposed to coded entries is that narrative text can tell the story of the patient and the care provided particularly in complex cases. My opinion is that free narrative text should be used to complement coded entries when necessary to capture relevant information.

Furthermore, medical knowledge is expanding very rapidly. For example, PubMed has more than 24 millions citations for biomedical literature from MEDLINE, life science journals, and online books. It is impossible for the human brain to keep up with that amount of knowledge. These unstructured sources of knowledge contain the scientific evidence that is required for effective clinical decision making in what is referred to as Evidence-Based Medicine (EBM).

In this blog, I discuss two practical applications of Natural Language Processing (NLP). The first is the use of NLP tools and techniques to automatically extract clinical concepts and other insight from clinical notes for the purpose of providing treatment recommendations in Clinical Decision Support (CDS) systems. The second is the use of text analytics techniques like clustering and summarization for Clinical Question Answering (CQA).

The emphasis of this post is on a practical approach using freely available and mature open source tools as opposed to an academic or theoretical approach. For a theoretical treatment of the subject, please refer to the book Speech and Language Processing by Daniel Jurafsky and James Martin.

Clinical NLP with Apache cTAKES

Based on the Apache Unstructured Information Management Architecture (UIMA) framework and the Apache OpenNLP natural language processing toolkit, Apache cTAKES provides a modular architecture utilizing both rule-based and machine learning techniques for information extraction from clinical notes. cTAKES can extract named entities (clinical concepts) from clinical notes in plain text or HL7 CDA format and map these entities to various dictionaries including the following Unified Medical Language System (UMLS) semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, and medications.

cTAKES includes the following key components which can be assembled to create processing pipelines:

Sentence boundary detector based on the OpenNLP Maximum Entropy (ME) sentence detector.
Tokenizor
Normalizer using the National Library of Medicine's Lexical Variant Generation (LVG) tool
Part-of-speech (POS) tagger
Shallow parser
Named Entity Recognition (NER) annotator using dictionary look-up to UMLS concepts and semantic types. The Drug NER can extract drug entities and their attributes such as dosage, strength, route, etc.
Assertion module which determines the subject of the statement (e.g., is the subject of the statement the patient or a parent of the patient) and whether a named entity or event is negated (e.g., does the presence of the word "depression" in the text implies that the patient has depression).

Apache cTAKES 3.2 has added YTEX, a set of extensions developed at Yale University which provide integration with MetaMap, semantic similarity, export to Machine Learning packages like Weka and R, and feature engineering.

The following diagram from the Apache cTAKES Wiki provides an overview of these components and their dependencies (click to enlarge):

Source: Apache cTAKES wiki

Massively Parallel Clinical Text Analytics in the Cloud with GATECloud

The General Architecture for Text Engineering (GATE) is a mature, comprehensive, and open source text analytics platform. GATE is a family of tools which includes:

GATE Developer: an integrated development environment (IDE) for language processing components with a comprehensive set of available plugins called CREOLE (Collection of REusable Objects for Language Engineering).
GATE Embedded: an object library for embedding services developed with GATE Developer into third-party applications.
GATE Teamware: a collaborative semantic annotation environment based on a workflow engine for creating manually annotated corpora for applying machine learning algorithms.
GATE Mímir: the "Multi-paradigm Information Management Index and Repository" which supports a multi-paradigm approach to index and search over text, ontologies, and semantic metadata.
GATE Cloud: a massively parallel clinical text analytics platform (Platform as a Service or PaaS) built on the Amazon AWS Cloud.

What makes GATE particularly attractive is the recent addition of GATECloud.net PaaS which can boost the productivity of people involved in large scale text analytics tasks.

Clustering, Classification, Text Summarization, and Clinical Question Answering (CQA)

An unsupervised machine learning approach called Clustering can be used to classify large volumes of medical literature into groups (clusters) based on some similarity measure (such as the Euclidean distance). Clustering can be applied at the document, search result, and word/topic levels. Carrot2 and Apache Mahout are open source projects that provide several methods for document clustering. For example, the Latent Dirichlet Allocation learning algorithm in Apache Mahout automatically clusters words into topics and documents into mixtures of topics. Other clustering algorithms in Apache Mahout include: Canopy, Mean-Shift, Spectral, K-Means and Fuzzy K-Means. Apache Mahout is part of the Hadoop ecosystem and can therefore scale to very large volumes of unstructured text.

Document classification essentially consists in assigning predefined set of labels to documents. This can be achieved through supervised machine learning algorithms. Apache Mahout implements the Naive Bayes classifier.

Text summarization techniques can be used to present succinct and clinically relevant evidence to clinicians at the point of care. MEAD (http://www.summarization.com/mead/) is an open source project that implements multiple summarization algorithms. In the biomedical domain, SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text. Subject and object arguments of each predication are concepts from the UMLS Metathesaurus and the relation is from the UMLS Semantic Network (e.g., TREATS, Co-OCCURS_WITH). The SemRep summarization provides a short summary of these concepts and their semantic relations.

AskHermes (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS) is a project that attempts to implement these techniques in the clinical domain. It allows clinicians to enter questions in natural language and uses the following unstructured information sources: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines, and Wikipedia articles.

The processing pipeline in AskHermes includes the following: Question Analysis, Related Questions Extraction, Information Retrieval, Summarization and Answer Presentation. AskHermes performs question classification using MMTx (MetaMap Technology Transfer) to map keywords to UMLS concepts and semantic types. Classification is achieved through supervised machine learning algorithms such as Support Vector Machine (SVM) and conditional random fields (CFRs). Summarization and answer presentation are based on clustering techniques. AskHermes is powered by open source components including: JBoss Seam, Weka, Mallet , Carrot2 , Lucene/Solr, and WordNet (a lexical database for the English language).

Sunday, December 29, 2013

Improving the quality of mental health and substance use treatment: how can Informatics help?

According to the 2012 National Survey on Drug Use and Health, an estimated 43.7 million adults aged 18 or older in the United States had mental illness in the past year. This represents 18.6 percent of all adults in this country. Among those 43.7 million adults, 19.2 percent (8.4 million adults) met criteria for a substance use disorder (i.e., illicit drug or alcohol dependence or abuse). In 2012, an estimated 9.0 million adults (3.9 percent) aged 18 or older had serious thoughts of suicide in the past year.

Mental health and substance use are often associated with other issues such as:

Co-morbidity involving other chronic diseases like HIV, hepatitis, diabetes, and cardiovascular disease.

Overdose and emergency care utilization.

Social issues like incarceration, violence, homelessness, and unemployment.

It is now well established that addiction is a chronic disease of the brain and should be treated as such from a health and social policy standpoint.

The regulatory framework

The Affordable Care Act (ACA) requires non-grandfathered health plans in the individual and small group markets to provide essential health benefits (EHBs) including mental health and substance use disorder benefits.

Starting in 2014, insurers can no longer deny coverage because of a pre-existing mental health condition.

The ACA requires health plans to cover recommended evidence-based prevention and screening services including depression screening for adults and adolescents and behavioral assessments for children.

On November 8, 2013, HHS and the Departments of Labor and Treasury released the final rules implementing the Paul Wellstone and Pete Domenici Mental Health Parity and Addiction Equity Act of 2008 (MHPAEA).

Not all behavioral health specialists are eligible to the Meaningful Use EHR Incentive program created by the Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009.

Implementing Clinical Practice Guidelines (CPGs) with Clinical Decision Support (CDS) systems

Clinical Decision Support (CDS) can help address key challenges in mental health and substance use treatment such as:

Shortages and high turnover in the addiction treatment workforce.

Insufficient or lack of adequate clinician education in mental health and addiction medicine.

Lack of implementation of available evidence-based clinical practice guideline (CPGs) in mental health and addiction medicine.

For example, there are a number of scientifically validated CPGs for the Medication Assisted Treatment (MAT) of opioid addiction using methadone or buprenorphine. These evidence-based CPGs can be translated into executable CDS rules using business rule engines. These executable clinical rules should also be seamlessly integrated with clinical workflows.

The complexity and costs inherent in capturing the medical knowledge in clinical guidelines and translating that knowledge into executable code remains an impediment to the widespread adoption of CDS software. Therefore, there is a need for standards that facilitate the sharing and interchange of CDS knowledge artifacts and executable clinical guidelines. The ONC Health eDecision Initiative has published specifications to support the interoperability of CDS knowledge artifacts and services.

Ontologies as knowledge representation formalism are well suited for modeling complex medical knowledge and can facilitate reasoning during the automated execution of clinical guidelines based on patient data at the point of care.

The typical Clinical Practice Guideline (CPG) is 50 to 150 pages long. Clinical Decision Support (CDS) should also include other forms of cognitive aid such as Electronic Checklists, Data Visualization, Order Sets, and Infobuttons.

The issues of human factors and usability of CDS systems as well as CDS integration with clinical workflows have been the subject of many research projects in healthcare informatics. The challenge is to be bring these research findings into the practice of developing clinical systems software.

Learning from Data

Learning what works and what does not work in clinical practice is important for building a learning health system. This can be achieved by incorporating the results of Comparative Effectiveness Research (CER) and Patient-Centered Outcome Research (PCOR) into CDS systems. Increasingly, outcomes research will be performed using observational studies (based on real world clinical data) which are recognized as complementary to randomized control trials (RCTs). For example, CER and PCOR can help answer questions about the comparative effectiveness of pharmacological and psychotherapeutic interventions in mental health and substance abuse treatment. This is a form of Practice-Based Evidence (PBE) that is necessary to close the evidence loop.

Three factors are contributing to the availability of massive amounts of clinical data: the rising adoption of EHRs by providers (thanks in part to the Meaningful Use incentive program), medical devices (including those used by patients outside of healthcare facilities), and medical knowledge (for example in the form of medical research literature). Massively parallel computing platforms such as Apache Hadoop or Apache Spark can process humongous amounts of data (including in real time) to obtain actionable insights for effective clinical decision making.

The use of predictive modeling for personalized medicine (based on statistical computing and machine learning techniques) is becoming a common practice in healthcare delivery as well. These models can predict the health risk of patients (for pro-active care) based on their individual health profiles and can also help predict which treatments are more likely to lead to positive outcomes.

Embedding Visual Analytics capabilities into CDS systems can help clinicians obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. For example, Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes for a patient or population of patients over a certain period of time through the vivid showing of causes, variables, comparisons, and explanations.

Genomics of Addiction and Personalized Medicine

Advances in genomics and pharmacogenomics are helping researchers understand treatment response variability among patients in addiction treatment. Clinical Decision Support (CDS) systems can also be used to provide cognitive support to clinicians in providing genetically guided treatment interventions.

Quality Measurement for Mental Health and Substance Use Treatment

An important implication of the shift from a fee-for-service to a value-based healthcare delivery model is that existing process measures and the regulatory requirements to report them are no longer sufficient.

Patient-reported outcomes (PROs) and patient-centered measures include essential metrics such as mortality, functional status, time to recovery, severity of side effects, and remission (depression remission at six and twelve months). These measures should take into account the values, goals, and wishes of the patient. Therefore patient-centered outcomes should also include the patient's own evaluation of the care received.

Another issue to be addressed is the lack of data elements in Electronic Medical Record (EMR) systems for capturing, reporting, and analyzing PROs. This is the key to accountability and quality improvement in mental health and substance use treatment.

Using Natural Language Processing (NLP) for the automated processing of clinical narratives

Electronic documentation in mental health and substance use treatment is often captured in the form of narrative text such as psychotherapy notes. Natural Language Processing (NLP) and machine learning tools and techniques (such as named entity recognition) can be used to extract clinical concepts and other insight from clinical notes.

Another area of interest is Clinical Question Answering (CQA) that would allow clinicians to ask questions in natural language and extract clinical answers from very large amounts of unstructured sources of medical knowledge. PubMed has more than 23 millions citations for biomedical literature from MEDLINE, life science journals, and online books. It is impossible for the human brain to keep up with that amount of knowledge.

Computer-Based Cognitive Behavioral Therapy (CCBT) and mHealth

According to a report published last year by the California HealthCare Foundation and titled The Online Couch: Mental Health Care on the Web:

"Computer-based cognitive behavioral therapy (CCBT) cost-effectively leverages the Internet for coaching patterns in self-driven or provider-assisted programs. Technological advances have enabled computer systems designed to replicate aspects of cognitive behavior therapy for a growing range of mental health issues".

An example of a successful nationwide adoption of CCBT is the online behavioral therapy site Beating the Blues in the United Kingdom which has been proven to help patients suffering from anxiety and mild to moderate depression. Beating the Blues has been recommended for use in the NHS by the National Institute for Health and Clinical Excellence (NICE).

In addition, there is growing evidence to support the efficacy of mobile health (mHealth) technologies for supporting patent engagement and activation in health behavior change (e.g., smoking cessation).

Technologies in support of a Collaborative Care Model

There is sufficient evidence to support the efficacy of the collaborative care model (CCM) in the treatment of chronic mental health and substance use conditions.The CCM is based on the following principles:

Coordinated care involving a multi-disciplinary care team.

Longitudinal care plan as the backbone of care coordination.

Co-location of primary care and mental health and substance use specialists.

Case management by a Care Manager.

Implementing an effective collaborative care model will require a new breed of advanced clinical collaboration tools and capabilities such as:

Conversations and knowledge sharing using tools like video conferencing for virtual two-way face-to-face communication between clinicians (see my previous post titled Health IT Innovations for Care Coordination).

Clinical content management and case management tools.

File sharing and syncing allowing the longitudinal care plan to be synchronized and shared among all members of the care team.

Light-weight and simple clinical data exchange standards and protocols for content, transport, security, and privacy.

Patient Consent and Privacy

Because of the stigma associated with mental health and substance use, it is important to give patients control over the sharing of their medical record. Patients consent should be obtained about what type information is shared, with whom, and for what purpose. The patient should also have access to an audit trail of all data exchange-related events. Current paper-based consent processes are inefficient and lack accountability. Web-based consent management applications facilitate the capture and automated enforcement of patient consent directives (see my previous post titled Patient privacy at web scale).

Sunday, November 10, 2013

Toward Polyglot Programming on the JVM

In my previous post titled Treating Javascript as a first class language, I wrote about how the Java Virtual Machine (JVM) is evolving with new languages and frameworks like Groovy, Grails, Scala, Akka, and the Play Framework. In this post, I report on my experience in learning and evaluating these emerging technologies and their roles in the Java ecosystem.

A KangaRoo on the JVM

On a previous project, I used Spring Roo to jumpstart the software development process. Spring Roo was created by Ben Alex, an Australian engineer who is also the creator of Spring Security. Spring Roo was a big productivity boost and generated a significant amount of code and configuration based on the specification of the domain model. Spring Roo automatically generated the following:

The domain entities with support for JPA annotations.
Repository and service layers. In addition to JPA, Spring Roo also supports NoSQL persistence for MongoDB based on the Spring Data repository abstraction.
A web layer with Spring MVC controllers and JSP views with support for Tiles-based layout, theming, and localization. The JSP views were subsequently replaced with a combination of Thymeleaf (a next generation server-side HTML5 template engine) and Twitter Boostrap to support a Responsive Web Design (RWD) approach. Roo also supports GWT and JSF.
REST and JSON remoting for all domain types.
Basic configuration for Spring Security, Spring Web Flow, Spring Integration, JMS, Email, and Apache Solr.
Entity mocking, automatic generation of test data ("Data on Demand"), in-container integration testing, and end-to-end Selenium integration tests.
A Maven build file for the project and full integration with Spring STS.
Deployment to Cloud Foundry.

Roo also supports other features such as database reverse engineering and Ajax . Another benefit of using Roo is that it helped enforce Spring best practices and other architectural concerns such as proper application layering.

For my future projects, I am looking forward to taking developer's productivity and innovation to the next level. There are several criteria in my mind:

Being able to do more with less. This means being able to write code that is concise, expressive, requires less configuration and boilerplate coding, and is easier to understand and maintain (particularly for difficult concerns like concurrency which is a key factor in scalability).
Interoperability with the Java language and being able to run on the JVM, so that I can take advantage of the larger and rich Java ecosystem of tools and frameworks.
Lastly, my interest in responsive, massively scalable, and fault-tolerant systems has picked up recently.

Getting Groovy

Maven has been a very powerful build system for several projects that I have worked on. My goal now is to support continuous delivery pipelines as a pattern for achieving high quality software. Large open source projects like Hibernate, Spring, and Android have already moved to Gradle. Gradle builds are written in a Groovy DSL and are more concise than Maven POM files which are based on a more verbose XML syntax. Gradle supports Java, Groovy, and Scala out-of-the box. It also has other benefits like incremental builds, multi-project builds, and plugins for other essential development tools like Eclipse, Jenkins, SonarQube, Ivy, and Artifactory.

Grails is a full-stack framework based on Groovy, leveraging its concise syntax (which includes Closures), dynamic language programming, metaprogramming, and DSL support. The core principle of Grails is "convention over configuration". Grails also integrates well with existing and popular Java projects like Spring Security, Hibernate, and Sitemesh. Roo generates code at development time and makes use of AOP. Grails on the other hand generates code at run-time, allowing the developer to do more with less code. The scaffolding mechanism is very similar in Roo and Grails.

Grails has its own view technology called Groovy Server Pages (GSP) and its own ORM implementation called Grails Object Relational Mapping (GORM) which uses Hibernate under the hood. There is also decent support for REST/JSON and URL routing to controller actions. This makes it easy to use Grails together with Javascript MVC frameworks like AngularJS in creating more responsive user experiences based on the Single Page Application (SPA) architectural pattern.

There are many factors that can influence the decision to use Roo vs. Grails (e.g., the learning curve associated with Groovy and Grails for a traditional Java team). There is also a new high-productivity framework called Spring Boot that is emerging as part of the soon to be released Spring Framework 4.0.

Becoming Reactive

I am also interested in massively scalable and fault-tolerant systems. This is no longer a requirement solely for big internet players like Google, Twitter, Yahoo, and LinkedIn that need to scale to millions of users. These requirements (including response time and up time) are also essential in mission-critical applications such as healthcare.

The recently published "Reactive Manifesto" makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." That is the premise of the other two prominent languages on the JVM: Scala and Clojure. They are based on a different programming paradigm (than traditional OOP) called Functional Programming that is becoming very popular in the multi-core era.

Twitter uses Scala and has open-sourced some of their internal Scala resources like "Effective Scala" and "Scala School". One interesting framework based on Scala is Akka, a concurrency framework built on the Actor Model.

The Play Framework 2 is a full-stack web application framework based on Scala which is currently used by LinkedIn (which has over 225 millions registered users worldwide). In addition to its elegant design, Play's unique benefits include:

An embedded Java NIO (New I/O) non-blocking server based on JBoss Netty, providing the ability to call collaborating services asynchronously without relying on thread pools to handle I/O. This new breed of servers is called "Evented Servers" (NodeJS is another implementation) as opposed to the old "Threaded Servers". Older frameworks like Spring MVC use a threaded and synchronous approach which is more difficult to scale.
The ability to make changes to the source code and just refresh the browser page to see the changes (this is called hot reload).
Type-safe Scala templates (errors are displayed in the browser during development).
Integrated support for Akka which provides (among other benefits) fault-tolerance, the ability to quickly recover from failure.
Asynchronous responses (based on the concepts of "Future" and "Promise" also found in AngularJS), caching, iteratees (for processing large streams of data), and support for real-time push-based technologies like WebSockets and Server-Sent Events.

The biggest challenge in moving to Scala is that the move to Functional Programming can be a significant learning curve for developers with a traditional OOP background in Java. Functional Programming is not new. Languages like Lisp and Haskell are functional programming languages. More recently, XML processing languages like XSLT and XQuery have adopted functional programming ideas.

Bringing Clojure to the JVM

Clojure is a dialect of LISP and a dynamically-type functional programming language which compiles to JVM bytecode. Clojure supports multithreaded programming and immutable data structures. One interesting application of Clojure is Incanter, a statistical computing and data visualization environment enabling big data analysis on the JVM.

Sunday, June 9, 2013

Essential IT Capabilities Of An Accountable Care Organization (ACO)

The Certification Commission for Health Information Technology (CCHIT) recently published a document entitled A Health IT Framework for Accountable Care. The document identifies the following key processes and functions necessary to meet the objectives of an ACO:

Care Coordination
Cohort Management
Patient and Caregiver Relationship Management
Clinician Engagement
Financial Management
Reporting
Knowledge Management.

The key to success is a shift to a data-driven healthcare delivery. The following is my assessment of the most critical IT capabilities for ACO success:

Comprehensive and standardized care documentation in the form of electronic health records including as a minimum: patients' signs and symptoms, diagnostic tests, diagnoses, allergies, social and familiy history, medications, lab results, care plans, interventions, and actual outcomes. Disease-specific Documentation Templates can support the effective use of automated Clinical Decision Support (CDS). Comprehensive electronic documentation is the foundation of accountability and quality improvement.

Care coordination through the secure electronic exchange and the collaborative authoring of the patient's medical record and care plan (this is referred to as clinical information reconciliation in the CCHIT Framework). This also requires health IT interoperability standards that are easy to use and designed following rigorous and well-defined software engineering practices. Unfortunately, this has not always been the case, resulting in standards that are actually obstacles to interoperability as opposed to enablers of interoperability. Case Management tools used by Medical Homes (a concept popularized by the Patient-Centered Medical Home model) can greatly facilitate collaboration and Care Coordination.

Patients' access to and ownership of their electronic health records including the ability to edit, correct, and update their records. Patient portals can be used to increase patients' health literacy with health education resources. Decision aids comparing the benefits and harms of various interventions (Comparative Effectiveness Research) should be available to patients. Patients' health behavior change remains one of the greatest challenges in Healthcare Transformation. mHealth tools have demonstrated their ability to support Patient Activation.

Secure communication between patients and their providers. Patients should have the ability to specify with whom, for what purpose, and the kind of medical information they want to share. Patients should have access to an audit trail of all access events to their medical records just as consumers of financial services can obtain their credit record and determine who has inquired about their credit score.

Clinical Decision Support (CDS) as well as other forms of cognitive aids such as Electronic Checklists, Data Visualization, Order Sets, Infobuttons, and more advanced Clinical Question Answering (CQA) capabilities (see my previous post entitled Automated Clinical Question Answering: The Next Frontier in Healthcare Informatics). The unaided mind (as Dr. Lawrence Weed, the father of the Problem-Oriented Medical Record calls it) is no longer able to cope with the large amounts of data and knowledge required in clinical decision making today. CDS should be used to implement clinical practice guidelines (CPGs) and other forms of Evidence-Based Medicine (EBM).

However, the delivery of care should also take into account the unique clinical characteristics of individual patients (e.g., co-morbidities and social history) as well as their preferences, wishes, and values. Standardized Clinical Assessment And Management Plans (SCAMPs) promote care standardization while taking into account patient preferences and the professional judgment of the clinician. CDS should be well integrated with clinical workflows (see my my previous post entitled Addressing Challenges to the Adoption of Clinical Decision Support (CDS) Systems).

Predictive risk modeling to identity at-risk populations and provide them with pro-active care including early screening and prevention. For example, predictive risk modeling can help identify patients at risk of hospital re-admission, an important ACO quality measure.

Outcomes measurement with an emphasis on patient outcomes in addition to existing process measures. Examples of patient outcome measures include: mortality, functional status, and time to recovery.

Clinical Knowledge Management (CKM) to disseminate knowledge throughout the system in order to support a learning health system. The Institute of Medicine (IOM) released a report titled Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care. The report describes the learning health system as:

"delivery of best practice guidance at the point of choice, continuous learning and feedback in both health and health care, and seamless, ongoing communication among participants, all facilitated through the application of IT."

Applications of Human Factors research to enable the effective use of technology in clinical settings. Examples include: implementation of usability guidelines to reduce Alert Fatigue in Clinical Decision Support (CDS), Checklists, and Visual Analytics. There are many lessons to be learned from other mission-critical industries that have adopted automation. Following several incidents and accidents related to the introduction of the Glass Cockpit about 25 years ago, Human Factors training known as Cockpit Resource Management (CRM) is now standard practice in the aviation industry.

Sunday, February 17, 2013

Automated Clinical Question Answering: The Next Frontier in Healthcare Informatics

In a previous post, I predicted that 2013 will be the year I ntelligent Health IT Systems (iHIT) go mainstream. I based my prediction on a number of factors, notably the transformation of healthcare to a value-based delivery system driven by the latest scientific evidence (evidence-based practice and practice-based evidence).

Last week, IBM together with health insurer WellPoint Inc., and New York’s Memorial Sloan-Kettering Cancer Center announced the commercialization of Watson (the supercomputer which beat human champions in "Jeopardy!" on February 16, 2011) for question answering (QA) in the clinical domain. The following are some interesting facts released by IBM as part of this announcement:

The supercomputer has ingested 1,500 lung cancer cases from Sloan-Kettering records, plus 2 million pages of text from journals, textbooks and treatment guidelines. This is what I called Big Data in medicine.
In 2012, Watson became 240 percent faster and 75 percent smaller so it can run on a single server. No surprise here and I expect this trend to continue.

The following YouTube video entitled Oncology Diagnosis and Treatment explains how IBM envisions using Watson for Clinical Question Answering (CQA):

The User Experience in the Watson Demo

Clinical questions can be posed in natural language (spoken or typed in by the clinician using a keyboard).
The sources used for answering clinical questions include both structured (EMR databases) and unstructured information (journal articles, clinical guidelines, etc.).
Personalized medicine: the proposed interventions are driven by the data in the patient's medical record and the system can prompt the clinician for additional information on the patient if necessary. The displayed evidence and recommendations are updated to reflect changes in the patient's clinical data.
Human Factors: the clinician is always in the loop. She can ask Watson how it arrives at a specific care recommendation and can even remove a specific evidence (if deemed irrelevant or not appropriate).
The use of confidence scoring and evidence highlighting.
Patient-centeredness and shared decision making: the treatment plans take into account the values, goals, and wishes of the patient (patient preferences). Treatment options are discussed with the patient.
Comparative effectiveness is used to compare the benefits and harms of different interventions.
Information is displayed using data visualization (dashboard) to help meet key performance indicators in the context of a value-based payment model.

The Science Behind Watson

The real question is how do we make intelligent health IT systems like Watson widely available to all patients. A landmark report published by the Institute of Medicine in 2001 and titled Crossing the Quality Chasm - A New Health System for the 21st Century contained the following recommendation:

Patients should receive care based on the best available scientific knowledge. Care should not vary illogically from clinician to clinician or from place to place.

For the scientifically (and Artificial Intelligence) inclined, the following are some pointers on the science behind Watson:

The picture below represents a high level architecture of Watson (click on the image to enlarge it).

AskHermes and MiPACQ

IBM Watson is not the only effort to develop automated CQA capabilities. Some earlier CQA efforts used the PICO framework (Problem/Population, Intervention, Comparison, Outcome) to facilitate processing. More recent efforts have focused on the use of clinical questions posed in natural language.

AskHermes (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS) allows clinicians to enter questions in natural language and uses the following unstructured information sources: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines, and Wikipedia articles.

The processing pipeline in AskHermes includes the following: Question Analysis, Related Questions Extraction, Information Retrieval, Summarization and Answer Presentation. AskHermes performs question classification using MMTx (MetaMap Technology Transfer) to map keywords to UMLS concepts and semantic types. Classification is also achieved through supervised machine learning algorithms such as Support Vector Machine (SVM) and conditional random fields (CFRs). Summarization and answer presentation are based on clustering techniques.

MiPACQ (Multi-source Integrated Platform for Answering Clinical Questions) is based on Natural Language Processing (NLP) and Information Retrieval (IR) and utilizes data sources such as Electronic Medical Record (EMR) databases and online medical encyclopedia like Medpedia. MiPACQ uses a processing pipeline based on UIMA (Unstructured Information Management Architecture) and machine learning-based as well as rule-based scoring. NLP capabilities are provided by ClearTK and cTakes (clinical Text Analysis and Knowledge Extraction System).

The Road Ahead

Automated Clinical Question Answering (CQA) is really hard. However, that is the future of computing: intelligent machines we can have meaningful conversations with. CQA is a multidisciplinary field which combines disciplines like statistical computing, information retrieval, natural language processing, machine learning, rule engines, semantic web technologies, knowledge representation and reasoning, visual analytics, and massively parallel computing. There are several open source projects that provide the building blocks. Many EHR software today are glorified data entry systems. We need to move to the next level and that will require technical leadership.

Monday, December 13, 2010

Toward a Universal Exchange Language for Healthcare

The US President's Council of Advisors on Science and Technology (PCAST) published a report last week entitled: "Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward". The report calls for a universal exchange language for healthcare (abbreviated as UELH in this post). Specifically, the report says:

"We believe that the natural syntax for such a universal exchange language will be some kind of extensible markup language (an XML variant, for example) capable of exchanging data from an unspecified number of (not necessarily harmonized) semantic realms. Such languages are structured as individual data elements, together with metadata that provide an annotation for each data element."

First, let me say that I fully support the idea of a UELH. I've written in the past about the future of healthcare data exchange standards. The ASTM CCR and the HL7 CCD have been adopted for Meaningful Use Stage 1 and that was the right choice. In my opinion, the UELH proposed by PCAST is about the next generation healthcare data exchange standard that is yet to be built. It's part of the natural evolution and innovation that are inherent to the information technology industry. It is also a very challenging task that should be informed by the important work that has been done previously in this field including:

The ASTM CCR
The HL7 RIM, CDA, CCD, and greenCDA
Archetype-based EN 13606 from OpenEHR
The National Information Exchange Model (NIEM)
HITSP C32
Biomedical Ontologies using semantic web technologies such as OWL2, SKOS, and RDF.
Medical Terminologies such as SNOMED and RxNorm.

This new language should focus on identifying, addressing, and solving the issues with the use of the current set of healthcare data exchange standards. This will require a public discourse that is cordial and focused on solutions and innovative ideas. Most importantly, it will require listening to the concerns of implementers. This proposal should not be about reinventing the wheel. It should be about creating a better future by learning lessons from the past while being open-minded about new ideas and approaches to solving problems.

Note that the report talks about the syntax of this new language as some kind of an "XML variant". It also mentioned that the language must be extensible. This is important in order to enable innovation in this field. For example, we've recently seen a serious challenge to XML coming from JSON in the web APIs space (Twitter and Foursquare removed support for XML in their APIs and now only provide a JSON API). Similarly, in the Semantic Web space, alternatives to the RDF/XML serialization syntax have emerged such as the N-triples notation. This is not to say that XML is the wrong representation for healthcare data. It simply means that we should be open to innovation in this area.

Metadata and the Semantic Web in Healthcare

Closely related to the notion of metadata is the idea of the Semantic Web. Although semantic web technologies are not widely used in healthcare today, they could help address some of the issues with current healthcare standard information models including: model consistency, reasoning, and knowledge integration across domains (e.g. the genomics and clinical domains). In a report entitled "Semantic Interoperability Deployment and Research Roadmap", Alan Rector, an authority in the field of biomedical ontologies, explains the difference between ontologies and data structures:

A second closely related notion is that of an "information model" of "model of data structures". Both Archetypes and HL7 V3 Messages are examples of data structures. Formalisms for data structures bear many resemblances to formalisms for ontologies. The confusion is made worse because the description logics are often used for both. However, there is a clear difference.

Ontologies are about the things being represented – patients, their diseases. They are about what is always true, whether or not it is known to the clinician. For example, all patients have a body temperature (possibly ambient if they are dead); however, the body temperature may not be known or recorded. It makes no sense to talk about a patient with a "missing" body temperature.
Data structures are about the artefacts in which information is recorded. Not every data structure about a patient need include a field for body temperature, and even if it does, that field may be missing for any given patient. It makes perfect sense to speak about a patient record with missing data for body temperature.

A key point is that "epistemological issues" – issues of what a given physician or the healthcare system knows – should be represented in the data structures rather than the ontology. This causes serious problems for terminologies coding systems, which often include notions such as "unspecified" or even "missing". This practice is now widely deprecated but remains common.

One of the Common Terminology Services (CTS 2) submissions to the OMG is based on Semantic Web technologies such as OWL2, SKOS, and SPARQL. The UELH proposed by the PCAST should leverage the work that has been done by the biomedical ontology community.

The NIEM Approach to Metadata-Tagged Data Elements

The report goes on to say that the metadata attached to each of these data elements

"...would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information—who may access the mammograms, either identified or de-identified, and for what purposes, (iii) the provenance of the data—the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."

The report does not explain exactly how this should be done. So let's combine the wisdom of the NIEM, HL7 greenCDA, and OASIS XSPA (Cross-Enterprise Security and Privacy Authorization Profile of XACML for healthcare) to propose a solution. Let's assume that we need to add metadata about the equiment used for the lab result as well as patient consent directives to the following lab result entry which is marked up in greenCDA format:


<result>
  <resultID root="107c2dc0-67a5-11db-bd13-0800200c9a66" />
  <resultDateTime value="200003231430" />
  <resultType codeSystem="2.16.840.1.113883.6.1" code="30313-1"
  displayName="HGB" />
  <resultStatus code="completed" />
  <resultValue>
    <physicalQuantity value="13.2" unit="g/dl" />
  </resultValue>
  <resultInterpretation codeSystem="2.16.840.1.113883.5.83"
  code="N" />
  <resultReferenceRange>M 13-18 g/dl; F 12-16
  g/dl</resultReferenceRange>
</result>

In the following, an s:metadata attribute is added to the root element (s:metadata is of type IDREFS and for brevity, I am not showing the namespace declarations):


<result s:metadata="equipment consent">
  <resultID root="107c2dc0-67a5-11db-bd13-0800200c9a66" />
  <resultDateTime value="200003231430" />
  <resultType codeSystem="2.16.840.1.113883.6.1" code="30313-1"
  displayName="HGB" />
  <resultStatus code="completed" />
  <resultValue>
    <physicalQuantity value="13.2" unit="g/dl" />
  </resultValue>
  <resultInterpretation codeSystem="2.16.840.1.113883.5.83"
  code="N" />
  <resultReferenceRange>M 13-18 g/dl; F 12-16
  g/dl</resultReferenceRange>
</result>

The following is the lab test equipment metadata:


<LabTestEquipmentMetadata s:id="equipment">
  <SerialNumber>93638494749</SerialNumber>
  <Manufacuturer>MedLabEquipCo.</Manufacturer>
</LabTestEquipmentMetadata>

And here is the patient consent directives marked in XACML XSPA format (this snippet is taken from the NHIN Access Consent Policies Specification):


<ConsentMetadata s:id="consent">
<Policy xmlns="urn:oasis:names:tc:xacml:2.0:policy:schema:os"
PolicyId="12345678-1234-1234-1234-123456781234"
RuleCombiningAlgId="urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:first-applicable">
<Description>Sample XACML policy for NHIN</Description>
<!-- The Target element at the Policy level identifies the subject to whom the Policy applies -->
<Target>
<Resources>
<Resource>
<ResourceMatch MatchId="http://www.hhs.gov/healthit/nhin/function#instance-identifier-equal">

<AttributeValue DataType="urn:hl7-org:v3#II"
xmlns:hl7="urn:hl7-org:v3">
  <hl7:PatientId root="2.16.840.1.113883.3.18.103"
  extension="00375" />
</AttributeValue>
<ResourceAttributeDesignator AttributeId="http://www.hhs.gov/healthit/nhin#subject-id"
DataType="urn:hl7-org:v3#II" />
</ResourceMatch>
</Resource>
<Actions>
<!-- This policy applies to all document query and document retrieve transactions -->
<Action>
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

  <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
  urn:ihe:iti:2007:CrossGatewayRetrieve</AttributeValue>
  <ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:action"
  DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
</ActionMatch>
</Action>
<Action>
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

  <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
  urn:ihe:iti:2007:CrossGatewayQuery</AttributeValue>
  <ActionAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:action"
  DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
</ActionMatch>
</Action>
</Actions>
<Rule RuleId="133" Effect="Permit">
<Description>Permit access to all documents to all
physicians and nurses</Description>
<Target>
<Subjects>
  <Subject>
 <SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

   <!-- coded value for physicians -->
   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
   112247003</AttributeValue>
   <SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
   DataType="http://www.w3.org/2001/XMLSchema#string" />
 </SubjectMatch>
  </Subject>
  <Subject>
 <SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

   <!-- coded value for nurses -->
   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
   106292003</AttributeValue>
   <SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
   DataType="http://www.w3.org/2001/XMLSchema#string" />
 </SubjectMatch>
  </Subject>
</Subjects>
<!-- since there is no Resource element, this rule applies to all resources -->
</Target>
</Rule>
<Rule RuleId="134" Effect="Permit">
<Description>Allow access Dentists and Dental Hygienists
Access from the Happy Tooth dental practice to documents
with "Normal" confidentiality for a defined time
period.</Description>
<Target>
<Subjects>
  <Subject>
 <SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

   <!-- coded value for dentists -->
   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
   106289002</AttributeValue>
   <SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
   DataType="http://www.w3.org/2001/XMLSchema#string" />
 </SubjectMatch>
 <SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
   http://www.happytoothdental.com</AttributeValue>
   <SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:subject:organization-id"
   DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
 </SubjectMatch>
  </Subject>
  <Subject>
 <SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

   <!-- coded value for dental hygienists -->
   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
   26042002</AttributeValue>
   <SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xacml:2.0:subject:role"
   DataType="http://www.w3.org/2001/XMLSchema#string" />
 </SubjectMatch>
 <SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:anyURI-equal">

   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#anyURI">
   http://www.happytoothdental.com</AttributeValue>
   <SubjectAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:subject:organization-id"
   DataType="http://www.w3.org/2001/XMLSchema#anyURI" />
 </SubjectMatch>
  </Subject>
</Subjects>
<Resources>
  <Resource>
 <ResourceMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">

   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#string">
   N</AttributeValue>
   <ResourceAttributeDesignator AttributeId="urn:oasis:names:tc:xspa:1.0:resource:patient:hl7:confidentiality-code"
   DataType="http://www.w3.org/2001/XMLSchema#string" />
 </ResourceMatch>
  </Resource>
</Resources>
<Environments>
  <Environment>
 <EnvironmentMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:date-greather-than-or-equal">

   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#date">
   2009-07-01</AttributeValue>
   <EnvironmentAttributeDesignator AttributeId="http://www.hhs.gov/healthit/nhin#rule-start-date"
   DataType="http://www.w3.org/2001/XMLSchema#date" />
 </EnvironmentMatch>
  </Environment>
  <Environment>
 <EnvironmentMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:date-less-than-or-equal">

   <AttributeValue DataType="http://www.w3.org/2001/XMLSchema#date">
   2009-12-31</AttributeValue>
   <EnvironmentAttributeDesignator AttributeId="http://www.hhs.gov/healthit/nhin#rule-end-date"
   DataType="http://www.w3.org/2001/XMLSchema#date" />
 </EnvironmentMatch>
  </Environment>
</Environments>
</Target>
</Rule>
<Rule RuleId="135" Effect="Deny">
<Description>deny all access to documents. Since this
rule is last, it will be selected if no other rule
applies, under the rule combining algorithm of first
applicable.</Description>
<Target />
</Rule>
</Resources>
</Target>
</Policy>
</ConsentMetadata>

Please note the following:

Metadata "LabTestEquipmentMetadata" asserts the equipment used for the lab test.
Metadata "ConsentMetadata" asserts the patient consent directives leveraving the XSPA XACML format.
Metadata can be declared once and reused by multiple elements.
An element can refer to 0 or more metadata objects.

In NIEM, an appinfo:AppliesTo element in a metadata type declaration is used to indicate the type to which the metadata applies as in the following example (note this is not enforced by the XML schema validating parser, but can be enforced at the application level):


<xsd:complexType name="LabTestEquipmentMetadataType">
  <xsd:annotation>
    <xsd:appinfo>
      <i:AppliesTo i:name="LabResultType" />
    </xsd:appinfo>
  </xsd:annotation>
  <xsd:complexContent>
<xsd:extension base="s:MetadataType">
...
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>

<xsd:element name="LabTestEquipmentMetadata" type="LabTestEquipmentMetadataType" nillable="true"/>

NIEM defines a common metadata type that can be extended by any type definition that requires metadata:


<schema
    targetNamespace="http://niem.gov/niem/structures/2.0"
    version="alpha2"
    xmlns:i="http://niem.gov/niem/appinfo/2.0"
    xmlns:s="http://niem.gov/niem/structures/2.0"
    xmlns="http://www.w3.org/2001/XMLSchema">

 
  <attribute name="id" type="ID"/>
  <attribute name="linkMetadata" type="IDREFS"/>
  <attribute name="metadata" type="IDREFS"/>
  <attribute name="ref" type="IDREF"/>
  <attribute name="sequenceID" type="integer"/>
   
  <attributeGroup name="SimpleObjectAttributeGroup">
    <attribute ref="s:id"/>
    <attribute ref="s:metadata"/>
    <attribute ref="s:linkMetadata"/>
  </attributeGroup>

   <element name="Metadata" type="s:MetadataType" abstract="true"/>

  <complexType name="ComplexObjectType" abstract="true">
      <attribute ref="s:id"/>
      <attribute ref="s:metadata"/>
      <attribute ref="s:linkMetadata"/>
  </complexType>

  <complexType name="MetadataType" abstract="true">
      <attribute ref="s:id"/>
  </complexType>

</schema>

Any type definition that needs metadata can simply extend ComplexObjectType as follows for lab result type:


<xsd:complexType name="LabResultType">
  <xsd:complexContent>
    <xsd:extension base="s:ComplexObjectType">
      <xsd:sequence>...</xsd:sequence>
    </xsd:extension>
  </xsd:complexContent>
</xsd:complexType>

Tuesday, June 22, 2010

Health Information Exchanges (HIEs): Emerging Architectural Patterns

The following are some emerging architectural patterns in the nascent field of HIEs:

Decentralized and Hybrid Models utilizing a service-oriented architecture (SOA). A centralized registry stores metadata about the type and location of clinical data available in edge systems connected to the HIE. For privacy and security reasons, the clinical data itself is kept at its source as opposed to a centralized repository. Upon request, a Record Locator Service (RLS) finds the data in edge systems and securely routes the data to authorized requestors.
A Data Use and Reciprocal Support Agreement (DURSA) provides the legal framework for participation in the HIE.
Use of a master patient index (MPI) as a core infrastructure for patient matching. The MPI stores identifying information on all patients with records in participating systems.
Use of Integrating the Health Enterprise (IHE) profiles such as Patient Identity Cross-Reference (PIX) and Cross Enterprise Document Exchange (XSD.b) to facilitate patient discovery and the query and retrieval of clinical documents.
Health Information Event Messaging to provide the ability to subscribe to health information.
Interoperability with NHIN Exchange to enable exchange of data with other state and federal agencies such as the Department of Veterans Affairs (VA) and the Department of Defense (DoD). This requires support for NHIN messaging standards such as the Web Services Interoperability (WS-I) Profiles WS-I Basic v 2.0 and WS-I Security v 1.1. WS-I Basic specifies the use of SOAP 1.2, WSDL 1.1, WS-Addressing 1.0, WS-Policy 1.5 , MTOM 1.0 , and UDDI v3.0.2 for the NHIN Web Services Registry. WS-I Security defines the security standards for NHIN Exchange including TLS, AES 128, X.509, XML D-Sig, and Attachment Security. NHIN has adopted an authentication and authorization framework based on SAML 2.0.
Use of the HL7 Continuity of Care Documents (CCD) as the data exchange standards for clinical documents. Meaningful Use criteria allow the ASTM CCR specification as well.
Health Record Banks (HRBs) containing Personal Health Records (PHRs) as participating nodes in the HIE. The HRBs allow patients to exercise control over their health records by granting permissions to specific providers to view those health records.
Ability to connect to the HIE through a local EMR or a web-based portal (for example to allow access for physicians without an EMR).
For simple and secure interoperability, the NHIN Direct draft proposal at the time of this writing is to use:
- SMTP as a backbone protocol
- S/MIME-signed and encrypted messages for security
- IHE XDM for content and metadata packaging
- IHE XDR, REST, and Email (POP/IMAP) as edge protocols
- TLS (with a server certificate only) for on-the-wire security
- XDR as the backbone for NHIN Exchange.

Saturday, June 12, 2010

Putting XQuery to Work in Healthcare

The following are some of the challenges that healthcare organizations will be facing during the next few years:

Conversion from HIPAA 4010 to 5010
Conversion from ICD-9 to ICD-10
Efficiently storing, querying, processing, and exchanging Electronic Health Records (EHRs)
Mapping from HL7 2.x to HL7 v3 messages
Assembling EHRs by aggregating data from multiple organizations participating in Health Information Exchanges (HIEs).

XQuery is not just a query language for XML data sources. It is also a very powerful declarative, strongly typed, and side-effect free programming language for processing and manipulating XML documents. XQuery is a natural solution for querying and aggregating data coming from heterogeneous sources such as relational databases, native XML databases, file systems, and legacy data formats such as EDI. Some developers will find XQuery easier to use than XSLT because XQuery has a SQL-like syntax.

Migration to HIPAA 5010 and ICD 10

Conversion from HIPAA 4010 to 5010 and ICD-9 to ICD-10 will be a priority on the agenda in the next three years (details on final compliance dates can be found on this HHS web page).

The XQuery and XQuery Update Facility specifications provide a simple and elegant solution to this conversion challenge.

Health Information Exchanges (HIEs) and the Virtual Health Record

In a HIE with multiple participating organizations, EHR data must be assembled either through a centralized, federated, or hybrid data model. The data needed to assemble a longitudinal EHR (a virtual health record) for a patient could be coming from several providers, payers, lab companies, and medical devices. XQuery was designed to handle that type of XML processing use case.

Storing, Updating, and Querying EHRs

The HL7 CCD and ASTM CCR have been retained as Meaningful Use XML data exchange standards for EHRs. Mapping between an XML HL7 CCD representation (which is derived from the HL7 UML-based Reference Information Model or RIM) and an existing relational database structure is not trivial. IBM has been granted a patent entitled "Conversion of hierarchically-structured HL7 specifications to relational databases". The HL7 RIMBAA project provides some best practices on mapping RIM objects to a relational database structure.

With the emergence of native XML databases such as Oracle XML DB and IBM pureXML, XML is no longer just a messaging format. It can be used as a format for storing and querying data as well.

This article shows a sample code of updating an EHR stored in HL7 CDA format in an IBM DB2 pureXML native XML database.

Mapping from HL7 2.x to HL7 v3 messages

In countries like Canada where HL7 v3 has been adopted, a frequent challenge is to map from legacy HL7 2.x messages to HL7 v3 messages for example for lab results. An XQuery-based transform can be used to map from an HL7 v2.x XML structure to an HL7 v3 XML structure.

An Alternative to GELLO?

In a previous post entitled "Clinical Decision Support: Crossing the Chasm", I argued that Clinical Decision Support Systems (CDSS) implementers should be free to use any programming language of their choice. GELLO is an HL7 standard which specifies an expression and query language for CDSS. The following are the requirements for a CDSS expression and query language as specified in the GELLO specification:

vendor-independent
platform-independent
object-oriented and compatible with the vMR
easy to read/write
side-effect free
flexible
extensible

XQuery satisfies all these requirements except the third. XQuery is a functional programming language with no side effect as opposed to an object-oriented programming language. GELLO settled on the OMG Object Constraint Language (OCL). The following paragraph from the GELLO specification explains why XQuery (known as XQL at the time) wasn't selected:

XQL is a query language designed specifically for XML documents. XML documents are unordered, labeled trees, with nodes representing the document entity, elements, attributes, processing instructions and comments. The implied data model behind XML neither matches that of a relational data model nor that of an object-oriented data model. XQL is a query language for XML in the same sense as SQL is a query language for relational tables. Since the HL7 RIM data model and the vMR data model are both object-oriented, it is clear that XQL is not an appropriate approach for an object-oriented query and expression language.

That might have been true back in 2004 in an object-oriented world. Today, if the inputs to a CDSS are EHRs represented in HL7 CCD or ASTM CCR format, and those EHRs are stored in an XQuery compliant native XML database, then XQuery could be a strong candidate for an expression and query language for the CDSS.

Wednesday, June 9, 2010

Data Modeling for Electronic Health Records (EHR) Systems

Getting the data model right is of paramount importance for an Electronic Health Records (EHR) system. The factors that drive the data model include but are not limited to:

Patient safety
Support for clinical workflows
Different uses of the data such as input to clinical decision support systems
Reporting and analytics
Regulatory requirements such as Meaningful Use criteria.

Model First

Proven methodologies like contract-first web service design and model driven development (MDD) put the emphasis on deriving application code from the data model and not the other way around. Thousands of line of code can be auto-generated from the model, so it's important to get the model right.

Requirements Gathering

The objective here is to determine the entities, their attributes, and the relationships between those entities. For example, what are the attributes that are necessary to describe a patient's condition and how do you express the fact that a condition is a manifestation of an allergy? The data modeler should work closely with clinicians to gather those requirements. Industry standards should be leveraged as well. For example, HITSP C32 defines the data elements for each EHR data module such as conditions, medications, allergies, and lab results. These data elements are then mapped to the HL7 Continuity of Care Document (CCD) XML schema.

The HL7 CCD is itself derived from the HL7 Reference Information Model (RIM). The latter is expressed as a set of UML class diagrams and is the foundation model for health care and clinical data. A simpler alternative to the CCD is the ASTM Continuity of Care Records (CCR). Both the CCD and CCR provide an XML schema for data exchange and are Meaningful Use criteria. Another relevant data model is the HL7 vMR (Virtual Medical Record) which aims to define a data model for the input and output of Clinical Decision Support Systems (CDSS).

These standards can be cumbersome to use as such from a software development perspective. Nonetheless, they can inform the design of the data model for an EHR system. Alignment with the CCD and CCR will facilitate data exchange with other providers and organizations. The following are Meaningful Use criteria for data exchange:

Electronically receive a patient summary record, from other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures and upon receipt of a patient summary record formatted in an alternative standard specified in Table 2A row 1, displaying it in human readable format.
Enable a user to electronically transmit a patient summary record to other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures in accordance with the standards specified in Table 2A row 1.

Applying Data Modeling Patterns

Applying data modeling patterns allows model consistency and quality. Relational data modeling is a well established discipline. My favorite resource for relational data modeling patterns is: The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling.

Some XML Schema best practices can be found here.

Data Stores

Today, options for data store are no longer limited to relational databases. Alternatives include: native XML databases (e.g. DB2 pureXML), Entity-Attribute-Value with Classes and Relationships (EAV/CR), and Resource Description Framework (RDF) stores.

Native XML databases are more resilient to schema changes and do not require handling the impedance mismatch between XML documents, Java objects, and relational tables which can introduce design complexity, performance, and maintainability issues.

Storing EHRs in an RDF store can enable the inference of medical facts based on existing explicit medical facts. Such inferences can be driven by an ontology expressed in OWL or a set of rules expressed in a rule language such SWRL. Semantic Web technologies can also be helpful in checking the consistency of a model, data and knowledge integration across domains (e.g. the genomics and clinical domains), and for managing classification schemes like medical terminologies. RDF, OWL, and SWRL have been successfully implemented in Clinical Decision Support Systems (CDSS).

The data modeling notation used should be independent of the storage model or at least compatible with the latter. For example, if native XML storage is used, then a relational modeling notation might not be appropriate. In general, UML provides the right level of abstraction for implementation-agnostic modeling.

Due Diligence

When adopting a "noSQL" storage model, it is important to ensure that (a) the database can meet performance and scalability criteria and (b) the team has the skills to develop and maintain the database. Due diligence should be performed through benchmarking using a tool such as the IBM Transaction Processing over XML (TpoX). The team might need formal training in a new query language like XQuery or SPARQL.

A Longitudinal View of the Patient Health

Maintaining an up-to-date and truly longitudinal view of a patient's medical history requires merging and reconciling data from heterogeneous sources including providers' EMR systems, lab companies, medical devices, and payers' claim transaction repositories. The data model should facilitate the assembly of data from such diverse sources. XML tools based on XSLT, XQuery, or XQuery Update can be used to automate the merging.

The Importance of Data Validation

Data validation can be performed at the database layer, the application layer, and the UI layer. The data model should support the validation of the data. The following are examples of techniques that can be used for data validation:

XML Schema for structural validation of XML documents
ISO Schematron (based on XPath 2.0 and XSLT 2.0) for business rules validation of XML documents
A business rules engine like Drools
A data processing framework like Smooks
The validation features of a UI framework such as JSF2
The built-in validation features of the database.

The Future: Modeling with the NIEM IEPD

The HHS ONC issued an RFP for using the National Information Exchange Model (NIEM) Information Exchange Package Documentation (IEPD) process for healthcare data exchange. The ONC will release a NIEM Concept of Operations (ConOps). The NIEM IEPD process is explained here.

Tuesday, May 25, 2010

Architecting the Health Enterprise with TOGAF 9

Several factors are currently driving the increased complexity of health information technology (HIT). These factors include: a new regulatory framework, innovations in the practice of healthcare delivery, standardization, cross-enterprise integration, usability, mobility, security, privacy, and the imperative to improve care quality and reduce costs.

A methodology and governance framework is needed for creating a coherent and consistent enterprise architecture (EA). The latter should not be driven by vendors and their offerings. Instead, health enterprises should develop an EA that is aligned with their unique overarching business context, drivers, and vision. Developing an architecture capability that is based on a proven framework should be a top priority for health IT leaders.

TOGAF 9 is an Open Group standard that defines a methodology, standardized semantics, and processes that can be used by Enterprise Architects to align IT with the strategic goals of their organization. TOGAF 9 covers the following four architecture domains:

Business Architecture
Data Architecture
Application Architecture
Technology Architecture.

The diagram below from the TOGAF 9 documentation provides an overview (click on the image to enlarge).

The Architecture Development Method (ADM) is the core of TOGAF and describes a method for developing an enterprise architecture. TOGAF 9 includes specific guidance on how the ADM can be applied to service-oriented architecture (SOA) and enterprise security (two areas of interest in health IT). The different phases of the ADM are depicted on the following diagram (click on the image to enlarge).

The Architecture Capability Framework provides guidelines and resources for establishing an architecture capability within the enterprise. This capability operates the ADM. The Content Framework specifies the artifacts and deliverables for each Architecture Building Block (ABB). These artifacts are stored in a repository and classified according to the Enterprise Continuum.

The Open Group has been working on the adoption of an open EA modeling standard called ArchiMate. ArchiMate provides a higher level view of EA when compared to modeling standards such as BPMN and UML. It can be used to depict different layers of EA including business processes, applications, and technology in a way that can be consumed by non-technical business stakeholders. A sample of an ArchiMate enterprise view of a hospital can be found here.

HL7 has published the Services-Aware Interoperability Framework (SAIF), an architectural framework for facilitating interoperability between healthcare systems. SAIF includes the following four components: the Enterprise Conformance and Compliance Framework (ECCF), the Governance Framework (GF), the Behavioral Framework (BF), and the Information Framework (IF).

For guidance on using SOA in healthcare, the Healthcare Services Specification Project (HSSP) has published the Practical Guide for SOA in Healthcare based on the TOGAF Architecture Development Method (ADM) and the SAIF ECCF. The Practical Guide for SOA in Healthcare contains a sample Reference Enterprise Architecture. The Practical Guide for SOA in Healthcare Volume II describes an immunization case study.

Also noteworthy is the HL7 EHR System Functional Model (EHR-S FM) and the HSSP Electronic Health Record (EHR) System Design Reference Model (EHR SD RM).