Sunday, February 24, 2013

State of the Semantic Web in the Clinical Domain

In a previous post entitled Why Do We Need Ontologies in Healthcare Applications, I explained the important difference between ontologies, coding systems, and information models of data structures. I also outlined the benefits of using Semantic Web technologies like RDF, RDFS, OWL, SWRL, R2RML, SPARQL, SKOS, and Linked Open Data (LOD). These benefits include:

  • Reasoning and inferencing which are essential characteristics of intelligent Health IT Systems (iHIT)
  • Model consistency checking
  • Open World Assumption (OWA) and Non-Unique Naming Assumption enabling the integration of heterogeneous data sources and knowledge bases using Linked Open Data (LOD) principles. This integration can be accomplished by providing an RDF view over existing relational databases using R2RML (RDB to RDF Mapping Language) and by performing SPARQL federated queries. Intelligent queries can retrieve inferred facts using SPARQL 1.1 Entailment Regimes.
  • Linking to other biomedical ontologies like SNOMED and the Translational Medicine Ontology
  • Clinical Knowledge Management (CKM) using OWL to model and execute Clinical Practice Guidelines (CPGs) and Care Pathways (CPs).

Semantic Web in Clinical and Translational Research

The following are papers on how Semantic Web technologies are being used to realize these benefits in the healthcare domain:

Apache Stanbol

I recently came across Apache Stanbol, a new Apache project which is described as "a set of reusable components for semantic content management". What I really like about Apache Stanbol is that it not only works on unstructured data sources, but also integrates a number of other popular Apache open source software which can be used to add a semantic layer to modern RESTful content-oriented applications. These components include:

  • Apache Tika for text and metadata extraction from a variety of commonly used document formats
  • Apache OpenNLP for natural language processing and named entity recognition (NER)
  • Apache Solr for document store and semantic search
  • Apache Jena as the RDF and Semantic Web framework.
Other open source components like Apache Mahout (a scalable Machine Learning library) can be integrated to provide document recommendation and clustering services.

The Content Enhancers in Stanbol can perform named entity recognition (NER) and link text annotations to external datasets such as DBPedia.  In the clinical domain, these enhancers can be used to extract entities from medical records, journal articles, and clinical guidelines. These entities can then be linked to other clinical data sources such as drug and disease databases using Linked Data techniques.

Apache Stanbol also provides Reasoners based on Jena RDFS, OWL, and OWLMini Reasoners as well as the HermiT OWL Reasoner. These reasoners can perform consistency checking and classification. Stanbol supports Inference Rules in the following formats: SWRL, Jena Rules, and SPARQL (by converting Stanbol Rules into SPARQL CONSTRUCTs).

Sunday, February 17, 2013

Automated Clinical Question Answering: The Next Frontier in Healthcare Informatics

In a previous post, I predicted that 2013 will be the year Intelligent Health IT Systems (iHIT) go mainstream.  I based my prediction on a number of factors, notably the transformation of healthcare to a value-based delivery system driven by the latest scientific evidence (evidence-based practice and practice-based evidence).

Last week, IBM together with health insurer WellPoint Inc., and New York’s Memorial Sloan-Kettering Cancer Center announced the commercialization of Watson (the supercomputer which beat human champions in "Jeopardy!" on February 16, 2011) for question answering (QA) in the clinical domain. The following are some interesting facts released by IBM as part of this announcement:

  • The supercomputer has ingested 1,500 lung cancer cases from Sloan-Kettering records, plus 2 million pages of text from journals, textbooks and treatment guidelines. This is what I called Big Data in medicine.
  • In 2012, Watson became 240 percent faster and 75 percent smaller so it can run on a single server. No surprise here and I expect this trend to continue.

The following YouTube video entitled Oncology Diagnosis and Treatment explains how IBM envisions using Watson for Clinical Question Answering (CQA):

The User Experience in the Watson Demo


  • Clinical questions can be posed in natural language (spoken or typed in by the clinician using a keyboard).
  • The sources used for answering clinical questions include both structured (EMR databases) and unstructured information (journal articles, clinical guidelines, etc.).
  • Personalized medicine: the proposed interventions are driven by the data in the patient's medical record and the system can prompt the clinician for additional information on the patient if necessary. The displayed evidence and recommendations are updated to reflect changes in the patient's clinical data.
  • Human Factors: the clinician is always in the loop. She can ask Watson how it arrives at a specific care recommendation and can even remove a specific evidence (if deemed irrelevant or not appropriate).
  • The use of confidence scoring and evidence highlighting.
  • Patient-centeredness and shared decision making: the treatment plans take into account the values, goals, and wishes of the patient (patient preferences). Treatment options are discussed with the patient.
  • Comparative effectiveness is used to compare the benefits and harms of different interventions.
  • Information is displayed using data visualization (dashboard) to help meet key performance indicators in the context of a value-based payment model.

The Science Behind Watson

The real question is how do we make intelligent health IT systems like Watson widely available to all patients. A landmark report published by the Institute of Medicine in 2001 and titled Crossing the Quality Chasm - A New Health System for the 21st Century contained the following recommendation:

Patients should receive care based on the best available scientific knowledge. Care should not vary illogically from clinician to clinician or from place to place.

For the scientifically (and Artificial Intelligence) inclined, the following are some pointers on the science behind Watson:

The picture below represents a high level architecture of Watson (click on the image to enlarge it).


AskHermes and MiPACQ

IBM Watson is not the only effort to develop automated CQA capabilities.  Some earlier CQA efforts used the PICO framework (Problem/Population, Intervention, Comparison, Outcome) to facilitate processing. More recent efforts have focused on the use of clinical questions posed in natural language.

AskHermes (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS) allows clinicians to enter questions in natural language and uses the following unstructured information sources: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines, and Wikipedia articles.

The processing pipeline in AskHermes includes the following: Question Analysis, Related Questions Extraction, Information Retrieval, Summarization and Answer Presentation. AskHermes performs question classification using MMTx (MetaMap Technology Transfer) to map keywords to UMLS concepts  and semantic types. Classification is also achieved through supervised machine learning algorithms such as Support Vector Machine (SVM) and conditional random fields (CFRs). Summarization and answer presentation are based on clustering techniques.

MiPACQ (Multi-source Integrated Platform for Answering Clinical Questions) is based on Natural Language Processing (NLP) and Information Retrieval (IR) and utilizes data sources such as Electronic Medical Record (EMR) databases and online medical encyclopedia like Medpedia. MiPACQ uses a processing pipeline based on UIMA (Unstructured Information Management Architecture) and machine learning-based as well as rule-based scoring. NLP capabilities are provided by ClearTK and cTakes (clinical Text Analysis and Knowledge Extraction System).

The Road Ahead

Automated Clinical Question Answering (CQA) is really hard. However, that is the future of computing: intelligent machines we can have meaningful conversations with. CQA is a multidisciplinary field which combines disciplines like statistical computing, information retrieval, natural language processing, machine learning, rule engines, semantic web technologies, knowledge representation and reasoning, visual analytics, and massively parallel computing. There are several open source projects that provide the building blocks. Many EHR software today are glorified data entry systems. We need to move to the next level and that will require technical leadership.

Sunday, February 3, 2013

Patient Privacy At Web Scale

A study entitled Patients want granular privacy control over health information in electronic medical records by Kelly Caine and Rima Hanania in the current issue of the Journal of the American Medical Informatics Association (JAMIA) clearly indicates that patients want a granular level of control over the sharing of their medical information. Patients also want to control with whom their health information is shared and for what purpose. The study looks at how the presence of sensitive health information in a medical record affects patient privacy preferences. In this post, I discuss how current and emerging standards can be used to enforce patient privacy preferences at web scale.

First, I think the key to achieving patient privacy at web scale is to adopt proven light-weight protocols and standards such as REST, JSON, OAuth2, and OpenID Connect. The RESTful Health Exchange (RHEx) project funded by the Federal Health Archicture (FHA) was a step in the right direction. These protocols have also been embraced by large internet identity providers like Google, Facebook, and Microsoft. To increase the strength of authentication when using these existing online identities in patient-facing healthcare applications, techniques like multi-factor authentication (e.g., two-factor authentication using the user's phone) and adaptive risk authentication can be used. These light-weight standards and protocols contrast with enterprise-centric alternatives like SOAP and SAML which are the foundation for Integrating the Health Enterprise (IHE) standards including XDS.b, XDR, and XUA.

An emerging approach that could really help put patients in control of the privacy of their electronic medical record is the User-Managed Access (UMA) Protocol of the Kantara Initiative. According to the UMA Core specification:
User-Managed Access (UMA) is a profile of OAuth 2.0. UMA defines how resource owners can control protected-resource access by clients operated by arbitrary requesting parties, where the resources reside on any number of resource servers, and where a centralized authorization server governs access based on resource owner policy.
That sounds a lot like a healthcare environment where a typical patient has her health information residing in the Electronic Health Record (EHR) systems of multiple healthcare providers. A frequent use case is when the patient's health information is shared among providers during primary care physicians' referrals to specialist outpatient clinics. The following are the benefits for the patient privacy of a centralized authorization server as defined in UMA:

  • The ability to manage her consent directives (scope of access in UMA parlance) from a central location (ideally in the cloud) as opposed to the current paper-based environment where the patient signs a consent form for each provider and has no visibility into how the consent is being used and enforced.
  • It facilitates the update and revocation of the consent directives by the patient. 
  • It would give the patient a full audit trail of requests and access events related to her health information.
  • The patient user experience of managing their privacy preferences online can be significantly enhanced by data visualization. A study titled Exploring Visualization Techniques to Enhance Privacy Control UX for User-Managed Access introduced the notion of a "UMA Connection" for helping users visualize the context of a data sharing policy (e.g., contacts, allowed actions, access restrictions, and trusted claims).

In UMA, trusted claims (e.g., information about a requesting healthcare provider such as email, name, role, organization, and NPI) can be conveyed using OpenID Connect. The Google OpenID Connect Demo provides a step by step guide to OpenID Connect and Nat Sakimara's Dummy’s guide for the Difference between OAuth Authentication and OpenID is a good explanation of how OpenID Connect complements OAuth2. A separate specification entitled Binding Obligations on User-Managed Access (UMA) Participants proposes a legal framework that defines the obligations of parties that operate and use UMA-conforming software programs and services.

A recent post by Domenico Catalono entitled UMA Approach to Protect and Control Online Reputation describes a UMA-based approach for supporting privacy based on reputation and trust.  An example in the post is a "global reputation ranking" in the context of an online e-commerce site. In the context of healthcare privacy, when deciding to share their sensitive medical information with a specific healthcare provider, the same concept could be used to display the number and severity of security breaches experienced by the healthcare provider in the past. Section 13402(e)(4) of the HITECH Act actually requires posting a list of breaches of unsecured protected health information affecting 500 or more individuals. The list is available here.

The recently approved XACML 3.0 standard is a powerful mechanism for expressing and evaluating privacy policies. It provides capabilities such as obligation and advice expressions as well as delegation of authorization. In this presentation, Eve Maler discusses possible integration points between UMA and XACML.  The REST Profile of XACML 3.0 and the Request/Response Interface based on JSON and HTTP for XACML 3.0 proposals introduce the notion of "RESTful Authorization-as-a-Service (AZaaS)" which can facilitate the use of XACML in a UMA-based access control environment.