Adventures in Computing

Toward a Reference Architecture for Intelligent Systems in Clinical Care

2014-11-02T19:48:00.000-05:00

A Software Architecture for Precision Medicine

Intelligent systems in clinical care leverage the latest innovations in machine learning, real-time data stream mining, visual analytics, natural language processing, ontologies, production rule systems, and cloud computing to provide clinicians with the best knowledge and information at the point of care for effective clinical decision making. In this post, I propose a unified open reference architecture that combines all these technologies into a hybrid cognitive system for clinical decision support. Indeed, truly intelligent systems are capable of reasoning. The goal is not to replace clinicians, but instead to provide them with cognitive support during clinical decision making. Furthermore, Intelligent Personal Assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana have raised our expectations on how intelligent systems interact with users through voice and natural language.

In the strict sense of the term, a reference architecture should be abstracted away from concrete technology implementation. However in order to enable a better understanding of the proposed approach, I take liberty in explaining how available open source software can be used to realize the intent of the architecture. There is an urgent need for an open and interoperable architecture which can be deployed across devices and platforms. Unfortunately, this is not the case today with solutions like Apple's HealthKit and ResearchKit.

The specific open source software mentioned in this post can be substituted with other tools which provide similar capabilities. The following diagram is a depiction of the architecture (click to enlarge).

Clinical Data Sources

Clinical data sources are represented on the left of the architecture diagram. Examples include electronic medical record systems (EMR) commonly used in routine clinical care, clinical genome databases, genome variant knowledge bases, medical imaging databases, data from medical devices and wearable sensors, and unstructured data sources such as biomedical literature databases. The approach implements the Lambda Architecture enabling both batch and real-time data stream processing and mining.

Predictive Modeling, Real-Time Data Stream Mining, and Big Data Genomics

The back-end provides various tools and frameworks for advanced analytics and decision management. The analytics workbench includes tools for creating predictive models and data streaming mining. The decision management workbench includes a production rule system (providing seamless integration with clinical events and processes) and an ontology editor.

The incoming clinical data likely meet the Big Data criteria of volume, velocity, and variety (this is particularly true for physiological time series from wearable sensors). Therefore, specialized frameworks for large scale cluster computing like Apache Spark are used to analyze and process the data. Statistical computing and Machine Learning tools like R are used here as well. The goal is knowledge and patterns discovery using Machine Learning model builders like Decision Trees, k-Means Clustering, Logistic Regression, Support Vector Machines (SVMs), Bayesian Networks, Neural Networks, and the more recent Deep Learning techniques. The latter hold great promise in applications such as Natural Language Processing (NLP), medical image analysis, and speech recognition.

These Machine Learning algorithms can support diagnosis, prognosis, simulation, anomaly detection, care alerting, and care planning. For example, anomaly detection can be performed at scale using the k-means clustering machine learning algorithm in Apache Spark. In addition, Apache Spark allows the implementation of the Lambda Architecture and can also be used for genome Big Data analysis at scale.

In another post titled How Good is Your Crystal Ball?: Utility, Methodology, and Validity of Clinical Prediction Models, I discuss quantitative measures of performance for clinical prediction models.

Visual Analytics

Visual Analytics tools like D3.js, rCharts, ploty, googleVis, ggplot2, and ggvis can help obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. Of particular interest is Visual Analytics of real-time data streams like physiological time series. As a multidisciplinary field, Visual Analytics combines several disciplines such as human perception and cognition, interactive graphic design, statistical computing, data mining, spatio-temporal data analysis, and even Art. For example, similar to Minard's map of the Russian Campaign of 1812-1813 (see graphic below), Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes over a certain period of time by displaying causes, variables, comparisons, and explanations.

Production Rule System, Ontology Reasoning, and NLP

The architecture also includes a production rule engine and an ontology editor (Drools and Protégé respectively). This is done in order to leverage existing clinical domain knowledge available from clinical practice guidelines (CPGs) and biomedical ontologies like SNOMED CT. This approach complements machine learning algorithms' probabilistic approach to clinical decision making under uncertainty. The production rule system can translate CPGs into executable rules which are fully integrated with clinical processes (workflows) and events. The ontologies can provide automated reasoning capabilities for decision support.

NLP includes capabilities such as:

Text classification, text clustering, document and passage retrieval, text summarization, and more advanced clinical question answering (CQA) capabilities which can be useful for satisfying clinicians' information needs at the point of care; and
Named entity recognition (NER) for extracting concepts from clinical notes.

The data tier supports the efficient storage of large amounts of time series data and is implemented with tools like Cassandra and HBase. The system can run in the cloud, for example using the Amazon Elastic Compute Cloud (EC2). For real-time processing of distributed data streams, cloud-based solutions like Amazon Kinesis and Lambda can be used.

Clinical Decision Services

The clinical decision services provide intelligence at the point of care typically using deployed predictive models, clinical rules, text mining outputs, and ontology reasoners. For example, Machine Learning algorithms can be exported in predictive markup language (PMML) format for run-time scoring based on the clinical data of individual patients, enabling what is referred to as Personalized Medicine. Clinical decision services include:

Diagnosis and prognosis
Simulation
Anomaly detection
Data visualization
Information retrieval (e.g., clinical question answering)
Alerts and reminders
Support for care planning processes.

The clinical decision services can be deployed in the cloud as well. Other clinical systems can consume these services through a SOAP or REST-based web service interface (using the HL7 vMR and DSS specifications for interoperability) and single sign-on (SSO) standards like SAML2 and OpenID Connect.

Intelligent Personal Assistants (IPAs)

Clinical decision services can also be delivered to patients and clinicians through IPAs. IPAs can accept inputs in the form of voice, images, and user's context and respond in natural language. IPAs are also expanding to wearable technologies such as smart watches and glasses. The precision of speech recognition, natural language processing, and computer vision is improving rapidly with the adoption of Deep Learning techniques and tools. Accelerated hardware technologies like GPUs and FPGAs are improving the performance and reducing the cost of deploying these systems at scale.

Hexagonal, Reactive, and Secure Architecture

Intelligent Health IT systems are not just capable of discovering knowledge and patterns in data. They are also scalable, resilient, responsive, and secure. To achieve these objectives, several architectural patterns have emerged during the last few years:

Domain Driven Design (DDD) puts the emphasis on the core domain and domain logic and recommends a layered architecture (typically user interface, application, domain, and infrastructure) with each layer having well defined responsibilities and interfaces for interacting with other layers. Models exist within "bounded contexts". These "bounded contexts" communicate with each other typically through messaging and web services using HL7 standards for interoperability.

The Hexagonal Architecture defines "ports and adapters" as a way to design, develop, and test an application in a way that is independent of the various clients, devices, transport protocols (HTTP, REST, SOAP, MQTT, etc.), and even databases that could be used to consume its services in the future. This is particularly important in the era of the Internet of Things in healthcare.

Microservices consist in decomposing large monolithic applications into smaller services following good old principles of service-oriented design and single responsibility to achieve modularity, maintainability, scalability, and ease of deployment (for example, using Docker).

CQRS/ES: Command Query Responsibility Segregation (CQRS) and Event Sourcing (ES) are two architectural patterns which consist in the use of event-driven messaging and an Event Store for separating commands (write-side) from queries (read-side) relying on the principle of Eventual Consistency. CQRS/ES can be implemented in combination with microservices to deliver new capabilities such as temporal queries, behavioral analysis, complex audit logs, and real-time notifications and alerts.

Functional Programming: Functional Programming languages like Scala have several benefits that are particularly important for applying Machine Learning algorithms on large data sets. Like functions in mathematics, functions in Scala have no side effects. This provides referential transparency. Machine Learning algorithms are in fact based on Linear Algebra and Calculus. Scala supports high-order functions as well. Variables are immutable witch greatly simplifies concurrency. For all those reasons, Machine Learning libraries like Apache Mahout have embraced Scala, moving away from the Java MapReduce paradigm.

Reactive Architecture: The Reactive Manifesto makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." Leading frameworks that support Reactive Programming include Akka and RxJava. The latter is a library for composing asynchronous and event-based programs using observable sequences. RxJava is a Java port (with a Scala adaptor) of the original Rx (Reactive Extensions) for .NET created by Erik Meijer.

Based on the Actor Model and built in Scala, Akka is a framework for building highly concurrent, asynchronous, distributed, and fault tolerant event-driven applications on the JVM. Akka offers location transparency, fault tolerance, asynchronous message passing, and a non-deterministic share-nothing architecture. Akka Cluster provides a fault-tolerant decentralized peer-to-peer based cluster membership service with no single point of failure or single point of bottleneck.

Also built with Scala, Apache Kafka is a scalable message broker which provides high-throughput, fault-tolerance, built-in partitioning, and replication for processing real-time data streams. In the reference architecture, the ingestion layer is implemented with Akka and Apache Kafka.

Web Application Security: special attention is given to security across all layers, notably the proper implementation of authentication, authorization, encryption, and audit logging. The implementation of security is also driven by deep knowledge of application security patterns, threat modeling, and enforcing security best practices (e.g., OWASP Top Ten and CWE/SANS Top 25 Most Dangerous Software Errors) as part of the continuous delivery process.

An Interface that Works across Devices and Platforms

The front-end uses a Mobile First approach and a Single Page Application (SPA) architecture with Javascript-based frameworks like AngularJS to create very responsive user experiences. It also allows us to bring the following software engineering best practices to the front-end:

Dependency Injection
Test-Driven Development (Jasmine, Karma, PhantomJS)
Package Management (Bower or npm)
Build system and Continuous Integration (Grunt or Gulp.js)
Static Code Analysis (JSLint and JSHint), and
End-to-End Testing (Protractor).

For mobile devices, Apache Cordova can be used to access native functions when desired. The main goal is to provide a user interface that works across devices and platforms such as iOS, Android, and Windows Phone.

Interoperability

Interoperability will always be a key requirement in clinical systems. Interoperability is needed between all players in the healthcare ecosystem including providers, payers, labs, knowledge artifact developers, quality measure developers, and public health agencies like the CDC. These standards exist today and are implementation-ready. However, only health IT buyers have the leverage to demand interoperability from their vendors.

Standards related to clinical decision support (CDS) include:

The HL7 Fast Healthcare Interoperability Resources (FHIR)
The HL7 virtual Medical Record (vMR)
The HL7 Decision Support Services (DSS) specification
The HL7 CDS Knowledge Artifact specification
The DMG Predictive Model Markup Language (PMML) specification.

Overcoming Barriers to Adoption

In a previous post, I discussed a practical approach to addressing challenges to the adoption of clinical decision support (CDS) systems.

Single Sign-On (SSO) for Cloud-based SaaS Applications

2014-09-15T07:17:00.002-04:00

Single Sign-On (SSO) is a key capability for Software as a Service (SaaS) applications particularly when there is a need to integrate with existing enterprise applications. In the enterprise world dominated by SOAP-based web services, security has been traditionally achieved with standards like WS-Security, WS-SecurityPolicy, WS-SecureConversation, WS-Trust, XML Encryption, XML Signatures, the WS-Security SAML Token Profile, and XACML.

During the last few years, the popularity of Web APIs, mobile technology, and Cloud-based software services has led to the emergence of light-weight security standards in support of the new REST/JSON paradigm with specifications like OAuth2 and OpenID Connect.

In this post, I discuss the state of the art in standards for SSO.

SAML2 Web SSO Profile

SAML2 Web SSO Profile (not to be confused with the WS-Security SAML Token Profile mentioned earlier) is not a new standard. It was approved as an OASIS standard in 2005. SAML2 Web SSO Profile is still today a force to reckon with when it comes to enabling SSO within the enterprise. In a post titled SAML vs OAuth: Which One Should I Use?, Anil Saldhana, former Lead Identity Management Architect at Red Hat offered the following suggestions:

If your usecase involves SSO (when at least one actor or participant is an enterprise), then use SAML.

If your usecase involves providing access (temporarily or permanent) to resources (such as accounts, pictures, files etc), then use OAuth.

If you need to provide access to a partner or customer application to your portal, then use SAML.

If your usecase requires a centralized identity source, then use SAML (Identity provider).

If your usecase involves mobile devices, then OAuth2 with some form of Bearer Tokens is appropriate.

Salesforce.com who is arguably the leader in cloud-based SaaS services supports SAML2 Web SSO Profile as one of its main SSO mechanisms (see the Salesforce Single Sign-On Implementation Guide). The Google Apps platform supports SAML2 Web SSO Profile as well.

Federal Identity, Credential, and Access Management (FICAM), a US Federal Government initiative has selected SAML2 Web SSO Profile for the purpose of Level of Assurance (LOA) 1 to 4 as defined by the NIST Special Publication 800-62-2 (see ICAM SAML 2.0 Web Browser SSO Profile). This is significant given the challenges associated with identity federation at the scale of a large organization like the US federal government.

SAML bindings specify underlying transport protocols including:

HTTP Redirect Binding
HTTP POST Binding
HTTP Artifact Binding
SAML SOAP Binding.

SAML profiles define how the SAML assertions, protocols, and bindings are combined to support particular usage scenarios. The Web Browser SSO Profile and the Single Logout Profile are the most commonly used profiles.

Identity Provider (idP) initiated SSO with POST binding is one the most popular implementations (see diagram below from the OASIS SAML Technical Overview for a typical authentication flow).

The SAML2 Web SSO ecosystem is very mature, cross-platform, and scalable. There are a number of open source implementations available as well. However, things are constantly changing in technology and identity federation is no exception. At the Cloud Identity Summit in 2012, Craig Burton, a well known analyst in the identity space declared:

SAML is the Windows XP of Identity. No funding. No innovation. People still use it. But it has no future. There is no future for SAML. No one is putting money into SAML development. No one is writing new SAML code. SAML is dead.

Craig Burton further clarified his remarks by saying:

SAML is dead does not mean SAML is bad. SAML is dead does not mean SAML isn’t useful. SAML is dead means SAML is not the future.

At the time, this provoked a storm in the Twitterverse because of the significant investments that have been made by enterprise customers to implement SAML2 for SSO.

WS-Federation

There is an alternative to SAML2 Web SSO Profile called WS-Federation which is supported in Microsoft products like Active Directory Federation Services (ADFS), Windows Identity Foundation (WIF), and Azure Active Directory. Microsoft has been a strong promoter of WS-Federation and has implemented WS-Federation in several products. There is also a popular open source identity server on the .NET platform called Thinktecture IdentityServer v2 which also supports WS-Federation.

For enterprise SSO scenarios between business partners exclusively using Microsoft products and development environment, WS-Federation could be a serious contender. However, SAML2 is more widely supported and implemented outside of the Microsoft world. For example, Salesforce.com and Google Apps do not support WS-Federation for SSO. Note that Microsoft ADFS implements the SAML2 Web SSO Profile in addition to WS-Federation.

OpenID Connect

OpenID Connect is a simple identity layer on top of OAuth2. It has been ratified by the OpenID Foundation in February 2014 but has been in development for several years. Nat Sakimura's Dummy’s guide for the Difference between OAuth Authentication and OpenID is a good resource for understanding the difference between OpenID, OAuth2, and OpenID Connect. In particular, it explains why OAuth2 alone is not strictly an authentication standard. The following diagram from the OpenID Connect specification represents the components of the OpenID Connect stack (click to enlarge).

Also note that OAuth2 tokens can be JSON Web Token (JWT) or SAML assertions.

The following is the basic flow as defined in the OpenID Connect specification:

The RP (Client) sends a request to the OpenID Provider (OP).

The OP authenticates the End-User and obtains authorization.

The OP responds with an ID Token and usually an Access Token.

The RP can send a request with the Access Token to the UserInfo Endpoint.

The UserInfo Endpoint returns Claims about the End-User.

There are two subsets of the Core functionality with corresponding implementer’s guides:

Basic Client Implementer’s Guide –for a web-based Relying Party (RP) using the OAuth code flow
Implicit Client Implementer’s Guide – for a web-based Relying Party using the OAuth implicit flow

OpenID Connect is particularly well-suited for modern applications which offer RESTful Web APIs, support JSON payloads, run on mobile devices, and are deployed to the Cloud. Despite being a relatively new standard, OpenID Connect also boasts an impressive list of implementations across platforms. It is already supported by big players like Google, Microsoft, PayPal, and Salesforce. In particular, Google is consolidating all federated sign-in support onto the OpenID Connect standard. Open Source OpenID Connect Identity Providers include the Java-based OpenAM and the .Net-based Thinktecture Identity Server v3.

From WS* to JW* and JOSE

As can be seen from the diagram above, a complete identity federation ecosystem based on OpenID Connect will also require standards for representing security assertions, digital signatures, encryption, and cryptographic keys. These standards include:

JSON Web Token (JWT)
JSON Web Signature (JWS)
JSON Web Encryption (JWE)
JSON Web Key (JWK)
JSON Web Algorithms (JWA).

There is a new acronym for these emerging JSON-based identity and security protocols: JOSE which stands for Javascript Object Signing and Encryption. It is also the name of the IETF Working Group developing JWS, JWE, and JWK. A Java-based open source implementation called jose4j is available.

Access Control with the User Managed Access (UMA)

According to the UMA Core specification,

User-Managed Access (UMA) is a profile of OAuth 2.0. UMA defines how resource owners can control protected-resource access by clients operated by arbitrary requesting parties, where the resources reside on any number of resource servers, and where a centralized authorization server governs access based on resource owner policy.

In the UMA protocol, OpenID Connect provides federated SSO and is also used to convey user claims to the authorization server. In a previous post titled Patient Privacy at Web Scale, I discussed the application of UMA to the challenges of patient privacy.

Ontologies for Addiction and Mental Disease: Enabling Translational Research and Clinical Decision Support

2014-08-25T18:24:00.000-04:00

In a previous post titled Why do we need ontologies in healthcare applications, I elaborated on what ontologies are and why they are different from information models of data structures like relational database schemas and XML schemas commonly used in healthcare informatics applications. In this post, I discuss two interesting applications of ontology engineering related to addiction and mental disease treatment. The first is the use of ontologies for achieving semantic interoperability in translational research. The second is the use of ontologies for modeling complex medical knowledge in clinical practice guidelines (CPGs) for the purpose of automated reasoning during execution in clinical decision support systems (CDS) at the point of care.

Why Semantic Interoperability is needed in biomedical translational research?

In order to accelerate the discovery of new effective therapeutics for mental health and addiction treatment, there is a need to integrate data across disciplines spanning biomedical research and clinical care delivery [1]. For example, linking data across disciplines can facilitate a better understanding of treatment response variability among patients in addiction treatment. These disciplines include:

Genetics, the study of genes.
Chemistry, the study of chemical compounds including substances of abuse like heroin.
Neuroscience, the study of the nervous system and the brain (addiction is a chronic disease of the brain)
Psychiatry which is focused on the diagnosis, treatment, and prevention of addiction and mental disorders.

Each of these disciplines has its own terminology or controlled vocabularies. In the clinical domain for example, DSM5 and RrxNorm are used for documenting clinical care. In biomedical research, several ontologies have been developed over the last few years including:

The Gene Ontology (GO)
The Chemical Entities of Biological Interest Ontology (CHEBI)
NeuroLex, an OWL ontology covering major domains of neuroscience: anatomy, cell, subcellular, molecule, function, and dysfunction.

To facilitate semantic interoperability between these ontologies, there are best practices established by the Open Biomedical Ontology (OBO) community. An example of best practice is the use of an upper-level ontology called the Basic Formal Ontology (BFO) which acts as a common foundational ontology upon which new ontologies can be created. OBO ontologies and principles are available on the OBO Foundry web site.

Among the ontologies available on the OBO Foundry is the Mental Functioning Ontology (MF) [2, 3]. The MF is being developed as a collaboration between the University of Geneva in Switzerland and the University at Buffalo in the United States. The project also includes a Mental Disease Ontology (MD) which extends the MF and the Ontology for General Medical Science (OGMS). The Basic Formal Ontology (BFO) is an upper-level ontology for both the MF and the OGMS. The picture below is a view of the class hierarchy of the MD showing details of the class "Paranoid Schizophrenia" in the right pane of the windows of the beta release of Protege 5, an open source ontology development environment (click on the image to enlarge it).

The following is a tree view of the "Mental Disease Course" class (click on the image to enlarge it):

Ontology constructs defined by the OWL2 language can help establish common semantics (meaning) and relationships between entities across domains. These constructs provide automated inferencing capabilities such as equivalence (e.g., owl:sameAs and owl:equivalentClass) and subsumption (e.g., rdfs:subClassOf) relationships between entities.

In addition, publishing data sources following Linked Open Data (LOD) principles and semantic search using federated SPARQL queries can help answer new research questions. Another application is semantic annotation for natural language processing (NLP) applications.

Ontologies as knowledge representation formalism for clinical decision support (CDS)

As knowledge representation formalism, ontologies are well suited for modeling complex medical knowledge and can facilitate reasoning during the automated execution of clinical practice guidelines (CPGs) and Care Pathways (CPs) based on patient data at the point of care. Several approaches to modelling CPGs and CPs have been proposed in the past including PROforma, HELEN, EON, GLIF, PRODIGY, and SAGE. However, the lack of free and open source tooling has been a major impediment to a wide adoption of these knowledge representation formalisms. OWL has the advantage of being a widely implemented W3C Recommendation with available mature open source tools.

In practice, the medical knowledge contained in CPGs can be manually translated into IF-THEN statements in most programming languages. Executable CDS rules (like other complex types of business rules) can be implemented with a production rule engine using forward chaining. This is the approach taken by OpenCDS and some large scale CDS implementations in real world healthcare delivery settings. This allows CDS software developers to externalize the medical knowledge contained in clinical guidelines in the form of declarative rules as opposed to embedding that knowledge in procedural code. Many viable open source business rule management systems (BRMS) are available today and provide capabilities such as a rule authoring user interface, a rules repository, and a testing environment.

However, production rule systems have a limitation. They do not scale because they require writing a rule for each clinical concept code (there are more than 311,000 active concepts in SNOMED CT alone). An alternative is to exploit the class hierarchy in an ontology so that subclasses of a given superclass can inherit the clinical rules that are applicable to the superclass (this is called subsumption). In addition to subsumption, an OWL ontology also support reasoning with description logic (DL) axioms [4].

An ontology designed for a clinical decision support (CDS) system can integrate the clinical rules from a CPG, a domain ontology like the Mental Disorder (MD) ontology, and the patient medical record from an EHR database in order to provide inferences in the form of treatment recommendations at the point of care. The OWL API [5] facilitates the integration of ontologies into software applications. It supports inferencing using reasoners like Pellet and HermiT. OWL2 reasoning capabilites can be enhanced with rules represented in SWRL (Semantic Web Rule Language) which is implemented by reasoners like Pellet as well as the Protege OWL development environement. In addition to inferencing, another benefit of an OWL-based approach is transparency: the CDS system can provide an explanation or justification of how it arrives at the treatment recommendations.

Nonetheless, these approaches are not mutually exclusive: a production rule system can be integrated with business processes, ontologies, and predictive analytics models. Predictive analytics models provide a probabilistic approach to treatment recommendations to assist in the clinical decision making process.

References

[1] Janna Hastings, Werner Ceusters, Mark Jensen, Kevin Mulligan and Barry Smith. Representing mental functioning: Ontologies for mental health and disease. Proceedings of the Mental Functioning Ontologies workshop of ICBO 2012, Graz, Austria.

[2] Ceusters, W. and Smith, B. (2010a). Foundations for a realist ontology of mental disease. Journal of Biomedical Semantics, 1(1), 10.

[3] Hastings, J., Smith, B., Ceusters, W., and Mulligan, K. (2012). The mental functioning ontology. http://code.google.com/p/mental-functioning-ontology/, last accessed August 24, 2014

[4] Sesen MB, Peake MD, Banares-Alcantara R, Tse D, Kadir T, Stanley R, Gleeson F, Brady M. 2014 Lung Cancer Assistant: a hybrid clinical decision support application for lung cancer care. J. R. Soc. Interface 11: 20140534.

[5] Matthew Horridge, Sean Bechhofer. The OWL API: A Java API for OWL Ontologies Semantic Web Journal 2(1), Special Issue on Semantic Web Tools and Systems, pp. 11-21, 2011.

Natural Language Processing (NLP) for Clinical Decision Support: A Practical Approach

2014-08-17T16:54:00.004-04:00

A significant portion of the electronic documentation of clinical care is captured in the form of unstructured narrative text like psychotherapy and progress notes. Despite the big push to adopt structured data entry (as required by the Meaningful Use incentive program for example), many clinicians still like to document care using free narrative text. The advantage of using narrative text as opposed to coded entries is that narrative text can tell the story of the patient and the care provided particularly in complex cases. My opinion is that free narrative text should be used to complement coded entries when necessary to capture relevant information.

Furthermore, medical knowledge is expanding very rapidly. For example, PubMed has more than 24 millions citations for biomedical literature from MEDLINE, life science journals, and online books. It is impossible for the human brain to keep up with that amount of knowledge. These unstructured sources of knowledge contain the scientific evidence that is required for effective clinical decision making in what is referred to as Evidence-Based Medicine (EBM).

In this blog, I discuss two practical applications of Natural Language Processing (NLP). The first is the use of NLP tools and techniques to automatically extract clinical concepts and other insight from clinical notes for the purpose of providing treatment recommendations in Clinical Decision Support (CDS) systems. The second is the use of text analytics techniques like clustering and summarization for Clinical Question Answering (CQA).

The emphasis of this post is on a practical approach using freely available and mature open source tools as opposed to an academic or theoretical approach. For a theoretical treatment of the subject, please refer to the book Speech and Language Processing by Daniel Jurafsky and James Martin.

Clinical NLP with Apache cTAKES

Based on the Apache Unstructured Information Management Architecture (UIMA) framework and the Apache OpenNLP natural language processing toolkit, Apache cTAKES provides a modular architecture utilizing both rule-based and machine learning techniques for information extraction from clinical notes. cTAKES can extract named entities (clinical concepts) from clinical notes in plain text or HL7 CDA format and map these entities to various dictionaries including the following Unified Medical Language System (UMLS) semantic types: diseases/disorders, signs/symptoms, anatomical sites, procedures, and medications.

cTAKES includes the following key components which can be assembled to create processing pipelines:

Sentence boundary detector based on the OpenNLP Maximum Entropy (ME) sentence detector.
Tokenizor
Normalizer using the National Library of Medicine's Lexical Variant Generation (LVG) tool
Part-of-speech (POS) tagger
Shallow parser
Named Entity Recognition (NER) annotator using dictionary look-up to UMLS concepts and semantic types. The Drug NER can extract drug entities and their attributes such as dosage, strength, route, etc.
Assertion module which determines the subject of the statement (e.g., is the subject of the statement the patient or a parent of the patient) and whether a named entity or event is negated (e.g., does the presence of the word "depression" in the text implies that the patient has depression).

Apache cTAKES 3.2 has added YTEX, a set of extensions developed at Yale University which provide integration with MetaMap, semantic similarity, export to Machine Learning packages like Weka and R, and feature engineering.

The following diagram from the Apache cTAKES Wiki provides an overview of these components and their dependencies (click to enlarge):

Source: Apache cTAKES wiki

Massively Parallel Clinical Text Analytics in the Cloud with GATECloud

The General Architecture for Text Engineering (GATE) is a mature, comprehensive, and open source text analytics platform. GATE is a family of tools which includes:

GATE Developer: an integrated development environment (IDE) for language processing components with a comprehensive set of available plugins called CREOLE (Collection of REusable Objects for Language Engineering).
GATE Embedded: an object library for embedding services developed with GATE Developer into third-party applications.
GATE Teamware: a collaborative semantic annotation environment based on a workflow engine for creating manually annotated corpora for applying machine learning algorithms.
GATE Mímir: the "Multi-paradigm Information Management Index and Repository" which supports a multi-paradigm approach to index and search over text, ontologies, and semantic metadata.
GATE Cloud: a massively parallel clinical text analytics platform (Platform as a Service or PaaS) built on the Amazon AWS Cloud.

What makes GATE particularly attractive is the recent addition of GATECloud.net PaaS which can boost the productivity of people involved in large scale text analytics tasks.

Clustering, Classification, Text Summarization, and Clinical Question Answering (CQA)

An unsupervised machine learning approach called Clustering can be used to classify large volumes of medical literature into groups (clusters) based on some similarity measure (such as the Euclidean distance). Clustering can be applied at the document, search result, and word/topic levels. Carrot2 and Apache Mahout are open source projects that provide several methods for document clustering. For example, the Latent Dirichlet Allocation learning algorithm in Apache Mahout automatically clusters words into topics and documents into mixtures of topics. Other clustering algorithms in Apache Mahout include: Canopy, Mean-Shift, Spectral, K-Means and Fuzzy K-Means. Apache Mahout is part of the Hadoop ecosystem and can therefore scale to very large volumes of unstructured text.

Document classification essentially consists in assigning predefined set of labels to documents. This can be achieved through supervised machine learning algorithms. Apache Mahout implements the Naive Bayes classifier.

Text summarization techniques can be used to present succinct and clinically relevant evidence to clinicians at the point of care. MEAD (http://www.summarization.com/mead/) is an open source project that implements multiple summarization algorithms. In the biomedical domain, SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text. Subject and object arguments of each predication are concepts from the UMLS Metathesaurus and the relation is from the UMLS Semantic Network (e.g., TREATS, Co-OCCURS_WITH). The SemRep summarization provides a short summary of these concepts and their semantic relations.

AskHermes (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS) is a project that attempts to implement these techniques in the clinical domain. It allows clinicians to enter questions in natural language and uses the following unstructured information sources: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines, and Wikipedia articles.

The processing pipeline in AskHermes includes the following: Question Analysis, Related Questions Extraction, Information Retrieval, Summarization and Answer Presentation. AskHermes performs question classification using MMTx (MetaMap Technology Transfer) to map keywords to UMLS concepts and semantic types. Classification is achieved through supervised machine learning algorithms such as Support Vector Machine (SVM) and conditional random fields (CFRs). Summarization and answer presentation are based on clustering techniques. AskHermes is powered by open source components including: JBoss Seam, Weka, Mallet , Carrot2 , Lucene/Solr, and WordNet (a lexical database for the English language).

Enabling Scalable Realtime Healthcare Analytics with Apache Spark

2014-08-09T19:05:00.002-04:00

Modern and massively parallel computing platforms can process humongous amounts of data in real time to obtain actionable insights for effective clinical decision making. In this blog, I discuss an emerging Big Data platform called Apache Spark and its application to remote real-time healthcare monitoring using data from medical devices and wearable sensors. The goal is to provide effective remote care for an increasingly aging population as well as public health surveillance.

The Apache Spark Framework

Apache Spark has emerged during the last couple of years as an innovative platform for Big Data and in-memory cluster computing capable of running programs up to 100x faster than traditional Hadoop MapReduce. Apache Spark is written in Scala, a functional programming language (see my previous post titled Navigating in Scala land). Spark also offers a Java and a Python APIs. The Scala API allows developers to interact with Spark by using very concise and expressive Scala code.

The Spark stack also includes the following integrated tools:

Spark SQL which allows relational queries expressed in SQL, HiveQL, or Scala to be executed using Spark through a data abstraction called SchemaRDD. Supported data sources include Parquet files (a columnar storage format for Hadoop), JSON datasets, or data stored in Apache Hive.

Spark Streaming which enables fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets. The ingested data can be directly processed with Spark built-in Machine Learning algorithms.

MLlib (Machine Learning Library) provides a library of practical Machine Learning algorithms including support vector machines (SVM), logistic regression, decision trees, naive Bayes, and k-means clustering.

GraphX which provides graph-parallel computation for graph-analytics application like social networks.

Apache Spark can also play nicely with other frameworks within the Hadoop ecosystem. For example, it can run standalone or on a Hadoop 2's YARN cluster manager, on Amazon EC2 or a Mesos cluster manager. Spark can also read data from HFDS, HBase, Cassandra or any other Hadoop data source. Other noteworthy integrations include:

SparkR, an R package allowing the use of Spark from R, a very popular open source software environment for statistical computing with more that 5800 packages including Machine Learning packages; and

H2O-Sparkling which provides an integration with the H2O platform through in-memory sharing with Tachyon, a memory-centric distributed file system for data sharing across cluster frameworks. This allows Spark applications to leverage advanced distributed Machine Learning algorithms supported by the H2O platform like emerging Deep Learning algorithms.

Wearable Sensors for Remote Healthcare Monitoring

Three factors are contributing to the availability of massive amounts of clinical data: the rising adoption of EHRs by providers thanks in part to the Meaningful Use incentive program; the increasing use of medical devices including wearable sensors used by patients outside of healthcare facilities; and medical knowledge (for example in the form of medical research literature).

One promising area in Healthcare Informatics where Big Data architectures like the one provided by Apache Spark can make a difference is in applications using data from wearable health monitoring sensors for anomaly detection, care alerting, diagnosis, care planning, and prediction. For example, anomaly detection can be performed at scale using the k-means clustering machine learning algorithm in Spark.

These sensors and devices are part of a larger trend called the "Internet of Things". They enable new capabilities such as remote health monitoring for personalized medicine and chronic care management for an increasingly aging population as well as public health surveillance for outbreaks and epidemics.

Wearable sensors can collect vital signs data like weight, temperature, blood pressure (BP), heart rate (HR), blood glucose (BG), respiratory rate (RR), electrocardiogram (ECG), oxygen saturation (SpO2), and Photoplethysmography (PPG). Spark Streaming can be used to perform real-time stream processing on sensors data and the data can be processed and analyzed using the Machine Learning algorithms available in MLlib and the other integrated frameworks like R and H2O. What makes Spark particularly suitable for this type of applications is that sensor data meet the Big Data criteria of volume, velocity, and variety.

Researchers predict that internet use on mobile phones will increase 20-fold in Africa in the next five years. The number of mobile subscriptions in sub-Saharan Africa is expected to reach 635 millions by the end of this year. This unprecedented level of connectivity (fueled in part by the historical lack of land line infrastructure) provides opportunities for effective public health surveillance and disease management in the developing world.

Apache Spark is the type of open source computing infrastructure that is needed for distributed, scalable, and real-time healthcare analytics for reducing healthcare costs and improving outcomes.

Building business sustainability: why Agile needs Lean Startup principles?

2014-02-02T19:15:00.000-05:00

In his book on leadership titled On Becoming a Leader, Warren Bennis wrote: "Managers do things right while leaders do the right thing". This quote can help explain why Agile needs Lean Startup principles.

Toward Product Leadership

Lean Startup is about Product Leadership. In business, the ultimate goal of the enterprise is to eventually generate revenue and profit and that requires having enough customers who are willing to pay for a product or service. This is the key to sustainability in a free market system. The concept of the Lean Startup was first introduced by Eric Ries in his book titled: The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Steve Blank wrote an article in the Harvard Business Review last year titled: Why the Lean Start-Up Changes Everything.

Agile is about the management of Product Development. When applied properly using techniques such as Test Automation and Continuous Delivery, Agile (and its various flavors like XP, Scrum, or Kanban) is a proven methodology for successful software delivery. However, a purely ceremonial approach (daily standup, sprint planning, review, retrospective, story pointing, etc.) may not yield best results.

Having the best software developers in town and building a well-designed and well-tested software on time and under budget will not necessarily translate into market success and business growth and sustainability. So what is the missing piece? How do we ensure that Agile is delivering a product that users are willing to buy? How do we know that the software itself and the many features that we work hard to implement every sprint are actually going to be needed, paid for, and used by our customers?

How Agile projects can become big failed experiments

Agile promotes the use of cross-functional teams that include business users, subject matter experts (SMEs), software developers, and testers. There is also the role of Product Owner in Agile teams. The Product Owner and the business users help the team define and prioritize backlog items. The issue is that most of the time, neither the Product Owner nor the business users are the people who are actually going to sign a check or use their credit card to buy the product. This is the case when the product will be marketed and sold to the market at large. Implementing the wrong design and building the wrong product can be very costly to the enterprise. So the result is that the design and the features that are being implemented by the development team are just assumptions and hypotheses (by so called experienced experts and product visionaries) that have never been validated. Not surprisingly, many Agile projects have become big failed experiments.

Untested assumptions and hypotheses

What we call vision and strategy are often just untested assumptions and hypotheses. Yet, we invest significant time and resources pursuing those ideas. A deep understanding (acquired through lengthy industry experience) of the business, customers' pain points, regulatory environment, and the competitive landscape will not always produce the correct assumptions about a product. This is because the pace of change has accelerated dramatically over the last two decades.

Traditional management processes and tools like strategic planning and the Balanced Scorecard do not always provide a framework for validating those assumptions. Even newer management techniques like the Blue Ocean Strategy taught by business schools to MBA candidates contain significant elements of risk and uncertainty when confronted with the brutal reality of the marketplace.

This reminds me of my days in aviation training. Aviation operations are characterized by a high level of planning. The Flight Plan contains details about the departure time, route, estimated time enroute, fuel onboard, cruising altitude, airspeed, destination, alternate airports, etc. However, pilots are also trained to respond effectively to uncertainty. Examples of these critical decision points are the well known "Go/No Go" decision during takeoff and the "Go-Around" during the final approach to landing. According to the Flight Safety Foundation, a lack of go-arounds is the number one risk factor in approach and landing accidents and the number one cause of "runway excursions".

The following is how the FAA Airplane Flying Handbook describes the go-around:

The assumption that an aborted landing is invariably the consequence of a poor approach, which in turn is due to insufficient experience or skill, is a fallacy. The go-around is not strictly an emergency procedure. It is a normal maneuver that may at times be used in an emergency situation. Like any other normal maneuver, the go-around must be practiced and perfected.

Applying Lean Startup engineering Principles

A software development culture that tolerates risk-taking and failure as a source of rapid learning and growth is actually a good thing. The question is how do we perform fail-safe experiments early and often to validate our assumptions and hypotheses about customers' pain points and the business model (how money is made), quickly learn from those experiments, and pivot to new ideas and new experiments until we get the product right? The traditional retrospective in Agile usually involves discussion about what went wrong and how we can improve with a focus on the activities performed by team members. The concept of pivot in Lean Startup engineering is different. The pivot is about being responsive to customer feedback and demands in order to build a sustainable and resilient product and business. The pivot has significant implications on the architecture, design, and development of a product. As Peter Senge wrote in his book titled The Fifth Discipline: The Art and Practice of the The Learning Organization:

The only sustainable competitive advantage is an organization's ability to learn faster than the competition.

The Lean Startup recipe is to create Minimum Viable Products (MVPs) that are tested early and often with future customers of the product. An MVP can be created through rapid prototyping or by incrementally delivering new product features through Continuous Delivery, while leveraging cloud-based capabilities such as a Platform as a Service (PaaS) to remain lean. Testing MVPs requires the team (including software developers) to get out of their cubicles or workstations and meet with customers face-to-face whenever possible to obtain direct feedback and validation. MVPs can also be tested through analytics, A/B testing, or end-user usability testing. Actionable metrics like the System Usability Scale (SUS) should be collected during these fail-safe experiments and subsequently analyzed.

These fail-safe experiments allows the Product Owner and team to refine the product vision and business model through validated learning. Lean Startup principles are not just for startups. They can also make a big difference in established enterprises where resource constraints combined with market competition and uncertainty can render the traditional strategic planning exercise completely useless.

Navigating in Scala Land

2014-01-12T14:43:00.000-05:00

In a previous post titled Toward Polyglot Programming on the Java Virtual Machine (JVM), I described my preliminary exploration of the other languages and frameworks on the JVM including Groovy, Gradle, Grails, Scala, Akka, Clojure, and the Play Framework. I made the switch from Maven to Gradle, a Groovy-based build language that combines the best of Ant and Maven. I was seduced by the scaffolding capabilities of Grails, but decided to make the jump to Scala and functional programming. So I have been navigating in Scala land recently. In this post, I described my journey.

I am still learning, but I can tell you that I never had so much fun learning a new programming language. It probably has something to do with the use of pure mathematical functions in Scala. I did spend a year studying pure mathematics at the University of Abomey-Calavi after graduating from high school. My first exposure to functional programming was with XSLT and XQuery and I very much enjoyed programming without side effects when using those languages. XSLT 3.0 is a fully-fledged functional programming language with support for functions as first class values and high-order functions. XQuery 3.0 is a typed functional language for processing and querying XML data.

Scala is a complex and ambitious language as it supports both object-oriented and functional programming. When I started to learn Scala, I took the wrong directions several times and had to make several U-turns. So the following steps have been effective for me:

The Coursera class Functional Programming Principles in Scala taught by Martin Odersky (the designer of Scala) is a good place to start. It explains the motivations behind Scala and emphasizes its mathematical and functional nature without trying to map pre-existing knowledge (of Java or Python, or any other language) to Scala. This is a refreshing approach because Scala is a different language although it can interoperate with Java. A good companion to this course is the book Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition by Martin Odersky, Lex Spoon and Bill Venners.

If you're a Java programmer moving to Scala, then the book Scala for the Impatient by Cay S. Horstmann would be a good reference.

If you're interested in building a web application in Scala, then I would recommend the book Play for Scala by Peter Hilton, Erik Bakker and Francisco Canedo. Scala and Play come with their own ecosystem of tools. This includes a Scala-based build system called sbt (simple build tool), testing tools (like Spec2 and ScalaTest), IDEs (Eclipse-based Scala IDE and IntelliJ), database drivers (like ReactiveMongo and Slick), and authentication/authorization (SecureSocial and Deadbolt). Play has first-class support for JSON and REST, supports asynchronous responses (based on the concepts of "Future" and "Promise"), reactive programming with Akka, caching, iteratees (for processing large streams of data), and real-time push-based technologies like WebSockets and Server-Sent Events.

A this point, if you decide to dive into the deep waters of Scala, you might want to consider learning Reactive Programming and purely functional data structures.

There is a second Scala course at Coursera titled Principles of Reactive Programming taught by Martin Odersky, Erik Meijer, and Roland Kuhn. The Reactive Manifesto makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." In Scala land, there are currently two leading frameworks that support Reactive Programming: Akka and RxJava. The latter is a library for composing asynchronous and event-based programs using observable sequences. RxJava is a Java port (with a Scala adaptor) of the original Rx (Reactive Extensions) for .NET created by Erik Meijer. Based on the Actor Model, Akka is a framework for building highly concurrent, asynchronous, distributed, and fault tolerant event-driven applications on the JVM (it supports both Java and Scala).

For purely functional data structures, there is a Scala-based library called Scalaz. The book Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnason is a good resource for exploring Scalaz.

Readers of this blog probably know that I am a proponent of Domain Driven Design (DDD) in building complex software systems. So I have been investigated how DDD principles can be implemented with a functional and reactive approach. Vaughn Vernon recently presented a podcast on Reactive DDD with Scala and Akka. In a post titled Functional Patterns in Domain Modeling - Anemic Models and Compositional Domain Behaviors, Debasish Ghosh provides an interesting perspective on the subject of anemic domain models in DDD done within a functional programming language as opposed to an object-oriented one.

For me, Big Data is real, not just a buzzword. I believe in analyzing humongous amounts of data to find hidden patterns and obtain insight for solving complex problems. Dean Wampler called copious data, the killer app for functional programming. Scalding by Twitter can be used for writing MapReduce jobs in Scala. Apache Spark which is written in Scala can run Machine Learning programs up to 100x faster than Hadoop MapReduce in memory. A talk titled Why Spark is the Next Top Compute Model by Dean Wampler explains why Spark has emerged as the most likely replacement for MapReduce in Hadoop applications.

Improving the quality of mental health and substance use treatment: how can Informatics help?

2013-12-29T18:23:00.001-05:00

According to the 2012 National Survey on Drug Use and Health, an estimated 43.7 million adults aged 18 or older in the United States had mental illness in the past year. This represents 18.6 percent of all adults in this country. Among those 43.7 million adults, 19.2 percent (8.4 million adults) met criteria for a substance use disorder (i.e., illicit drug or alcohol dependence or abuse). In 2012, an estimated 9.0 million adults (3.9 percent) aged 18 or older had serious thoughts of suicide in the past year.

Mental health and substance use are often associated with other issues such as:

Co-morbidity involving other chronic diseases like HIV, hepatitis, diabetes, and cardiovascular disease.

Overdose and emergency care utilization.

Social issues like incarceration, violence, homelessness, and unemployment.

It is now well established that addiction is a chronic disease of the brain and should be treated as such from a health and social policy standpoint.

The regulatory framework

The Affordable Care Act (ACA) requires non-grandfathered health plans in the individual and small group markets to provide essential health benefits (EHBs) including mental health and substance use disorder benefits.

Starting in 2014, insurers can no longer deny coverage because of a pre-existing mental health condition.

The ACA requires health plans to cover recommended evidence-based prevention and screening services including depression screening for adults and adolescents and behavioral assessments for children.

On November 8, 2013, HHS and the Departments of Labor and Treasury released the final rules implementing the Paul Wellstone and Pete Domenici Mental Health Parity and Addiction Equity Act of 2008 (MHPAEA).

Not all behavioral health specialists are eligible to the Meaningful Use EHR Incentive program created by the Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009.

Implementing Clinical Practice Guidelines (CPGs) with Clinical Decision Support (CDS) systems

Clinical Decision Support (CDS) can help address key challenges in mental health and substance use treatment such as:

Shortages and high turnover in the addiction treatment workforce.

Insufficient or lack of adequate clinician education in mental health and addiction medicine.

Lack of implementation of available evidence-based clinical practice guideline (CPGs) in mental health and addiction medicine.

For example, there are a number of scientifically validated CPGs for the Medication Assisted Treatment (MAT) of opioid addiction using methadone or buprenorphine. These evidence-based CPGs can be translated into executable CDS rules using business rule engines. These executable clinical rules should also be seamlessly integrated with clinical workflows.

The complexity and costs inherent in capturing the medical knowledge in clinical guidelines and translating that knowledge into executable code remains an impediment to the widespread adoption of CDS software. Therefore, there is a need for standards that facilitate the sharing and interchange of CDS knowledge artifacts and executable clinical guidelines. The ONC Health eDecision Initiative has published specifications to support the interoperability of CDS knowledge artifacts and services.

Ontologies as knowledge representation formalism are well suited for modeling complex medical knowledge and can facilitate reasoning during the automated execution of clinical guidelines based on patient data at the point of care.

The typical Clinical Practice Guideline (CPG) is 50 to 150 pages long. Clinical Decision Support (CDS) should also include other forms of cognitive aid such as Electronic Checklists, Data Visualization, Order Sets, and Infobuttons.

The issues of human factors and usability of CDS systems as well as CDS integration with clinical workflows have been the subject of many research projects in healthcare informatics. The challenge is to be bring these research findings into the practice of developing clinical systems software.

Learning from Data

Learning what works and what does not work in clinical practice is important for building a learning health system. This can be achieved by incorporating the results of Comparative Effectiveness Research (CER) and Patient-Centered Outcome Research (PCOR) into CDS systems. Increasingly, outcomes research will be performed using observational studies (based on real world clinical data) which are recognized as complementary to randomized control trials (RCTs). For example, CER and PCOR can help answer questions about the comparative effectiveness of pharmacological and psychotherapeutic interventions in mental health and substance abuse treatment. This is a form of Practice-Based Evidence (PBE) that is necessary to close the evidence loop.

Three factors are contributing to the availability of massive amounts of clinical data: the rising adoption of EHRs by providers (thanks in part to the Meaningful Use incentive program), medical devices (including those used by patients outside of healthcare facilities), and medical knowledge (for example in the form of medical research literature). Massively parallel computing platforms such as Apache Hadoop or Apache Spark can process humongous amounts of data (including in real time) to obtain actionable insights for effective clinical decision making.

The use of predictive modeling for personalized medicine (based on statistical computing and machine learning techniques) is becoming a common practice in healthcare delivery as well. These models can predict the health risk of patients (for pro-active care) based on their individual health profiles and can also help predict which treatments are more likely to lead to positive outcomes.

Embedding Visual Analytics capabilities into CDS systems can help clinicians obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. For example, Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes for a patient or population of patients over a certain period of time through the vivid showing of causes, variables, comparisons, and explanations.

Genomics of Addiction and Personalized Medicine

Advances in genomics and pharmacogenomics are helping researchers understand treatment response variability among patients in addiction treatment. Clinical Decision Support (CDS) systems can also be used to provide cognitive support to clinicians in providing genetically guided treatment interventions.

Quality Measurement for Mental Health and Substance Use Treatment

An important implication of the shift from a fee-for-service to a value-based healthcare delivery model is that existing process measures and the regulatory requirements to report them are no longer sufficient.

Patient-reported outcomes (PROs) and patient-centered measures include essential metrics such as mortality, functional status, time to recovery, severity of side effects, and remission (depression remission at six and twelve months). These measures should take into account the values, goals, and wishes of the patient. Therefore patient-centered outcomes should also include the patient's own evaluation of the care received.

Another issue to be addressed is the lack of data elements in Electronic Medical Record (EMR) systems for capturing, reporting, and analyzing PROs. This is the key to accountability and quality improvement in mental health and substance use treatment.

Using Natural Language Processing (NLP) for the automated processing of clinical narratives

Electronic documentation in mental health and substance use treatment is often captured in the form of narrative text such as psychotherapy notes. Natural Language Processing (NLP) and machine learning tools and techniques (such as named entity recognition) can be used to extract clinical concepts and other insight from clinical notes.

Another area of interest is Clinical Question Answering (CQA) that would allow clinicians to ask questions in natural language and extract clinical answers from very large amounts of unstructured sources of medical knowledge. PubMed has more than 23 millions citations for biomedical literature from MEDLINE, life science journals, and online books. It is impossible for the human brain to keep up with that amount of knowledge.

Computer-Based Cognitive Behavioral Therapy (CCBT) and mHealth

According to a report published last year by the California HealthCare Foundation and titled The Online Couch: Mental Health Care on the Web:

"Computer-based cognitive behavioral therapy (CCBT) cost-effectively leverages the Internet for coaching patterns in self-driven or provider-assisted programs. Technological advances have enabled computer systems designed to replicate aspects of cognitive behavior therapy for a growing range of mental health issues".

An example of a successful nationwide adoption of CCBT is the online behavioral therapy site Beating the Blues in the United Kingdom which has been proven to help patients suffering from anxiety and mild to moderate depression. Beating the Blues has been recommended for use in the NHS by the National Institute for Health and Clinical Excellence (NICE).

In addition, there is growing evidence to support the efficacy of mobile health (mHealth) technologies for supporting patent engagement and activation in health behavior change (e.g., smoking cessation).

Technologies in support of a Collaborative Care Model

There is sufficient evidence to support the efficacy of the collaborative care model (CCM) in the treatment of chronic mental health and substance use conditions.The CCM is based on the following principles:

Coordinated care involving a multi-disciplinary care team.

Longitudinal care plan as the backbone of care coordination.

Co-location of primary care and mental health and substance use specialists.

Case management by a Care Manager.

Implementing an effective collaborative care model will require a new breed of advanced clinical collaboration tools and capabilities such as:

Conversations and knowledge sharing using tools like video conferencing for virtual two-way face-to-face communication between clinicians (see my previous post titled Health IT Innovations for Care Coordination).

Clinical content management and case management tools.

File sharing and syncing allowing the longitudinal care plan to be synchronized and shared among all members of the care team.

Light-weight and simple clinical data exchange standards and protocols for content, transport, security, and privacy.

Patient Consent and Privacy

Because of the stigma associated with mental health and substance use, it is important to give patients control over the sharing of their medical record. Patients consent should be obtained about what type information is shared, with whom, and for what purpose. The patient should also have access to an audit trail of all data exchange-related events. Current paper-based consent processes are inefficient and lack accountability. Web-based consent management applications facilitate the capture and automated enforcement of patient consent directives (see my previous post titled Patient privacy at web scale).

Toward Polyglot Programming on the JVM

2013-11-10T19:48:00.000-05:00

In my previous post titled Treating Javascript as a first class language, I wrote about how the Java Virtual Machine (JVM) is evolving with new languages and frameworks like Groovy, Grails, Scala, Akka, and the Play Framework. In this post, I report on my experience in learning and evaluating these emerging technologies and their roles in the Java ecosystem.

A KangaRoo on the JVM

On a previous project, I used Spring Roo to jumpstart the software development process. Spring Roo was created by Ben Alex, an Australian engineer who is also the creator of Spring Security. Spring Roo was a big productivity boost and generated a significant amount of code and configuration based on the specification of the domain model. Spring Roo automatically generated the following:

The domain entities with support for JPA annotations.
Repository and service layers. In addition to JPA, Spring Roo also supports NoSQL persistence for MongoDB based on the Spring Data repository abstraction.
A web layer with Spring MVC controllers and JSP views with support for Tiles-based layout, theming, and localization. The JSP views were subsequently replaced with a combination of Thymeleaf (a next generation server-side HTML5 template engine) and Twitter Boostrap to support a Responsive Web Design (RWD) approach. Roo also supports GWT and JSF.
REST and JSON remoting for all domain types.
Basic configuration for Spring Security, Spring Web Flow, Spring Integration, JMS, Email, and Apache Solr.
Entity mocking, automatic generation of test data ("Data on Demand"), in-container integration testing, and end-to-end Selenium integration tests.
A Maven build file for the project and full integration with Spring STS.
Deployment to Cloud Foundry.

Roo also supports other features such as database reverse engineering and Ajax . Another benefit of using Roo is that it helped enforce Spring best practices and other architectural concerns such as proper application layering.

For my future projects, I am looking forward to taking developer's productivity and innovation to the next level. There are several criteria in my mind:

Being able to do more with less. This means being able to write code that is concise, expressive, requires less configuration and boilerplate coding, and is easier to understand and maintain (particularly for difficult concerns like concurrency which is a key factor in scalability).
Interoperability with the Java language and being able to run on the JVM, so that I can take advantage of the larger and rich Java ecosystem of tools and frameworks.
Lastly, my interest in responsive, massively scalable, and fault-tolerant systems has picked up recently.

Getting Groovy

Maven has been a very powerful build system for several projects that I have worked on. My goal now is to support continuous delivery pipelines as a pattern for achieving high quality software. Large open source projects like Hibernate, Spring, and Android have already moved to Gradle. Gradle builds are written in a Groovy DSL and are more concise than Maven POM files which are based on a more verbose XML syntax. Gradle supports Java, Groovy, and Scala out-of-the box. It also has other benefits like incremental builds, multi-project builds, and plugins for other essential development tools like Eclipse, Jenkins, SonarQube, Ivy, and Artifactory.

Grails is a full-stack framework based on Groovy, leveraging its concise syntax (which includes Closures), dynamic language programming, metaprogramming, and DSL support. The core principle of Grails is "convention over configuration". Grails also integrates well with existing and popular Java projects like Spring Security, Hibernate, and Sitemesh. Roo generates code at development time and makes use of AOP. Grails on the other hand generates code at run-time, allowing the developer to do more with less code. The scaffolding mechanism is very similar in Roo and Grails.

Grails has its own view technology called Groovy Server Pages (GSP) and its own ORM implementation called Grails Object Relational Mapping (GORM) which uses Hibernate under the hood. There is also decent support for REST/JSON and URL routing to controller actions. This makes it easy to use Grails together with Javascript MVC frameworks like AngularJS in creating more responsive user experiences based on the Single Page Application (SPA) architectural pattern.

There are many factors that can influence the decision to use Roo vs. Grails (e.g., the learning curve associated with Groovy and Grails for a traditional Java team). There is also a new high-productivity framework called Spring Boot that is emerging as part of the soon to be released Spring Framework 4.0.

Becoming Reactive

I am also interested in massively scalable and fault-tolerant systems. This is no longer a requirement solely for big internet players like Google, Twitter, Yahoo, and LinkedIn that need to scale to millions of users. These requirements (including response time and up time) are also essential in mission-critical applications such as healthcare.

The recently published "Reactive Manifesto" makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." That is the premise of the other two prominent languages on the JVM: Scala and Clojure. They are based on a different programming paradigm (than traditional OOP) called Functional Programming that is becoming very popular in the multi-core era.

Twitter uses Scala and has open-sourced some of their internal Scala resources like "Effective Scala" and "Scala School". One interesting framework based on Scala is Akka, a concurrency framework built on the Actor Model.

The Play Framework 2 is a full-stack web application framework based on Scala which is currently used by LinkedIn (which has over 225 millions registered users worldwide). In addition to its elegant design, Play's unique benefits include:

An embedded Java NIO (New I/O) non-blocking server based on JBoss Netty, providing the ability to call collaborating services asynchronously without relying on thread pools to handle I/O. This new breed of servers is called "Evented Servers" (NodeJS is another implementation) as opposed to the old "Threaded Servers". Older frameworks like Spring MVC use a threaded and synchronous approach which is more difficult to scale.
The ability to make changes to the source code and just refresh the browser page to see the changes (this is called hot reload).
Type-safe Scala templates (errors are displayed in the browser during development).
Integrated support for Akka which provides (among other benefits) fault-tolerance, the ability to quickly recover from failure.
Asynchronous responses (based on the concepts of "Future" and "Promise" also found in AngularJS), caching, iteratees (for processing large streams of data), and support for real-time push-based technologies like WebSockets and Server-Sent Events.

The biggest challenge in moving to Scala is that the move to Functional Programming can be a significant learning curve for developers with a traditional OOP background in Java. Functional Programming is not new. Languages like Lisp and Haskell are functional programming languages. More recently, XML processing languages like XSLT and XQuery have adopted functional programming ideas.

Bringing Clojure to the JVM

Clojure is a dialect of LISP and a dynamically-type functional programming language which compiles to JVM bytecode. Clojure supports multithreaded programming and immutable data structures. One interesting application of Clojure is Incanter, a statistical computing and data visualization environment enabling big data analysis on the JVM.

Treating Javascript as a first-class language

2013-10-20T13:08:00.001-04:00

With the emergence of the Single Page Application (SPA) architecture as an approach to creating more fluid and responsive user experiences in the browser, Javascript is gaining prominence as a platform for modern application development. Paypal, a large online payment service, announced recently that it has achieved significant performance and productivity gains by shifting its server-side development from Java to Javascript. From a software architecture and development perspective, what do expressions like "Javascript as a first-class language" or "Javascript as a platform" actually mean?

Let's consider a well-established first-class language and platform like Java. By the way, I still consider Java a strong and safe bet for developing applications. What makes Java strong is not just the language, but the rich ecosystem of free and open source tools and frameworks built around it (e.g., Eclipse, Tomcat, JBoss Application Server, Drools, Maven, Jenkins, Solr, Hibernate, Spring, Hadoop to name just a few). The JVM is evolving with new languages and frameworks like Groovy, Grails, Clojure, Scala, Akka, and the Play Framework which aim to enhance developer's productivity. It is also well-known that big internet companies like Twitter have achieved significant gains in performance, scalability, and other architectural concerns by shifting a lot of back-end code from Ruby on Rails to the JVM. There are a number of architectural patterns and software development practices that have been adopted over the years in successfully building quality Java applications. These include:

Design patterns such as the Gang of Four (GoF), Dependency Injection, Model View Controller (MVC), Enterprise Integration Patterns (EIP), Domain Driven Design (DDD), and modularity patterns like those based on OSGi.
Test-Driven Development (TDD) using tools like JUnit, TestNG, Mockito (mocking), Cucumber-JVM (for behavior-driven development or BDD), and Selenium (for automated end-to-end testing).
Build tools like Maven and Gradle.
Static analysis with tools like FindBugs, Checkstyle, PMD, and Sonar.
Continuous integration and delivery with tools like Jenkins.
Performance testing with JMeter.
Web application vulnerability testing with Burp.

As we move to a rich client application paradigm based on Javascript and the Single Page Application (SPA) architecture, it is clear that Javascript can no longer be considered a toy language for front-end developers and so we need to bring the same engineering discipline to Javascript. As I said previously, the JVM remains my platform of choice for back-end development. For example, I find that AngularJS (a client-side Javascript MVC framework) works well with Spring back-end capabilities (like Spring Security and REST support in Spring MVC, HATEOAS, or Grail). However, I also keep an eye on server-side Javascript frameworks like Node.js.

The good news is that the community is coming up with patterns, tools, and practices that are helping elevate Javascript to the status of first-class language. The following is a list of patterns and tools that I find interesting and promising so far:

Javascript design patterns including the application of the GoFs to Javascript. The MVC and Dependency Injection patterns are both implemented in AngularJS, my favorite Javascript MVC framework. There are also modularity patterns like Asyncronous Module Definition (AMD) supported by RequireJS.
Functional programming support in Javascript (e.g., higher-order functions and closures) is emerging as a best practice in writing quality Javascript code.
Behavior-Driven Development (BDD) testing with Jasmine.
Static analysis with Javascript code quality tools like JSLint and JSHint.
Build with Grunt, a Javascript task runner.
Karma, a test runner for Javascript.
Protractor, an end-to-end test framework built on top of Selenium WedDriverJS.
Single Page Applications are subject to common web application vulnerabilities like Cookie Snooping, Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), and JSON Injection. Security is mainly the responsibility of the server, although client-side frameworks like AngularJS also provide some features to enhance the security of Single Page Applications.

Health IT Innovations for Care Coordination

2013-08-15T19:14:00.001-04:00

The Business Case

According to an article by Bodenheimer et al. published in the January/February 2009 issue of Health Affairs and titled Confronting The Growing Burden Of Chronic Disease: Can The U.S. Health Care Workforce Do The Job?:

In 2005, 133 million americans were living with at least one chronic condition. In 2020, this number is expected to grow to 157 million. In 2005, sixty-three million people had multiple chronic illnesses, and that number will reach eighty-one million in 2020.

Patients with co-morbidities are typically treated by multiple clinicians working for different healthcare organizations. Care Coordination is necessary for the effective treatment of these patients and reducing costs. Effective Care Coordination can reduce the number of redundant tests and procedures, hospital admissions and readmissions, medical errors, and patient safety issues related to the lack of medication reconciliation.

According to a paper by Dennison and Hugues published in the Journal of Cardiovascular Nursing and titled Progress in Prevention Imperative to Improve Care Transitions for Cardiovascular Patients, direct communication between the hospital and primary care setting occurred only 3 percent of the time. According to the same paper, at discharge, a summary was provided only 12 percent of the time, and this occurrence remained poor at 4 weeks post-discharge, with only 51 percent of practitioners providing a summary. The paper concluded that this standard affected quality of care in 25 percent of follow-up visits.

Health Information Exchanges (HIEs) and emerging delivery models like the Accountable Care Organization (ACO) and the Patient-Centered Medical Home (PCMH) were designed to promote care coordination. However, according to an article by Furukawa et al. published in the August 2013 issue of Health Affairs and titled Hospital Electronic Health Information Exchange Grew Substantially In 2008–12:

In 2012, 51 percent of hospitals exchanged clinical information with unaffiliated ambulatory care providers, but only 36 percent exchanged information with other hospitals outside the organization. . . . In 2012 more than half of hospitals exchanged laboratory results or radiology reports, but only about one-third of them exchanged clinical care summaries or medication lists with outside providers.

Furthermore, the financial sustainability of many HIEs remains an issue. According to another article by Adler-Milstein et al. published in the same issue of Health Affairs and titled Operational Health Information Exchanges Show Substantial Growth, But Long-Term Funding Remains A Concern, "74 percent of health information exchange efforts report struggling to develop a sustainable business model".

There are other obstacles to care coordination including the existing fee-for-service healthcare delivery model (as opposed to a value-based model), the lack of interoperability between healthcare information systems, and the lack of adoption of effective collaboration tools.

According to a report by the Institute of Medicine (IOM) titled The Healthcare Imperative: Lowering Costs and Improving Outcomes, a program designed to improve care coordination could result in national annual savings of $240.1 billions.

What Can We Learn From High Risk Operations in Other Industries?

Similar breakdowns in communication during shift handovers have also been observed in risky operating environments, sometimes with devastating consequences. In the aerospace industry, human factors research and training have played an important role in successfully addressing the issue. A research paper by Parke and Mishkin titled Best Practices in Shift Handover Communication: Mars Exploration Rover Surface Operations included the following recommendations:

Two-way Communication, Preferably Face-to-Face. . . . Two-way communication enables the incoming worker to ask questions and rephrase the material to be handed over, so as to expose these differences [in mental model].

Face-to-Face Handovers with Written Support. Face-to-face handovers are improved if they are supported by structured written material—e.g., a checklist of items to convey, and/or a position log to review.

Content of Handover Captures Intent. Handover communication works best if it captures problems, hypotheses, and intent, rather than simply lists what occurred.

While the logistics of healthcare delivery does not always permit physical face-to-face communication between clinicians during transitions of care, the web has seen an explosion in online collaboration tools. Innovative organizations have embraced these technologies giving rise to a new breed of enterprise software known as Enterprise 2.0 or Social Enterprise Software. This new breed of software is not only social, but also mobile, and cloud-based.

Care Coordination in the Health Enterprise 2.0

Collaborative Authoring of a Longitudinal Care Plan. From a content perspective, the Care Plan is the backbone of Care Coordination. The Care Plan should be comprehensive and standardized (similar to the checklist in aerospace operations). It should include problems, medications, orders, results, care goals (taking into consideration the patient's wishes and values), care team members and their responsibilities, and actual patient outcomes (e.g., functional status). Clinical Decision Support (CDS) tools can be used to dynamically generate a basic Care Plan based on the patient's specific clinical data. This basic Care Plan can be used by members of the care team to build a more elaborate Longitudinal Care Plan. CDS tools can also automatically generate alerts and reminders for the care team.

Communication and Collaboration using Enterprise 2.0 Software. These tools should be used to enable collaboration between all members of the care team which include not only clinicians, but also non-clinician caregivers, and the patient herself. Beyond email, these tools allow conversations and knowledge sharing through instant messaging, video conferencing (for virtual two-way face-to-face communication), content management, file syncing (allowing the longitudinal care plan to be synchronized and shared among all members of the care team), search, and enterprise social networking (because clinical work is a social activity like most human activities). A providers directory should make it easy for users to find a specific provider and all their contact information based on search criteria such as location, specialty, knowledge, experience, and telephone number.

Light Weight Standards and Protocols for Content, Transport, Security, and Privacy. The foundation standards are: REST, JSON, OAuth2, and OpenID Connect. An emerging approach that could really help put patients in control of the privacy of their electronic medical record is the OAuth2.0-based User-Managed Access (UMA) Protocol of the Kantara Initiative (see my previous post titled Patient Privacy at Web Scale). Initiatives like the ONC-sponsored RESTful Health Exchange (RHEX) project and the HL7 Fast Healthcare Interoperability Resources (FHIR) hold great promise.

Case Management Tools. They are typically used by Nurse Practionners (Case Managers) in Medical Homes, a concept popularized by the Patient-Centered Medical Home healthcare delivery model to coordinate care. These tools integrate various capabilities such as risk stratification (using predictive modeling) to identify at-risk patients, content management (check-in, check-out, versioning), workflows (human tasks), communication, business rule engine, and case reporting/analytics capabilities.

Essential IT Capabilities Of An Accountable Care Organization (ACO)

2013-06-09T21:24:00.000-04:00

The Certification Commission for Health Information Technology (CCHIT) recently published a document entitled A Health IT Framework for Accountable Care. The document identifies the following key processes and functions necessary to meet the objectives of an ACO:

Care Coordination
Cohort Management
Patient and Caregiver Relationship Management
Clinician Engagement
Financial Management
Reporting
Knowledge Management.

The key to success is a shift to a data-driven healthcare delivery. The following is my assessment of the most critical IT capabilities for ACO success:

Comprehensive and standardized care documentation in the form of electronic health records including as a minimum: patients' signs and symptoms, diagnostic tests, diagnoses, allergies, social and familiy history, medications, lab results, care plans, interventions, and actual outcomes. Disease-specific Documentation Templates can support the effective use of automated Clinical Decision Support (CDS). Comprehensive electronic documentation is the foundation of accountability and quality improvement.

Care coordination through the secure electronic exchange and the collaborative authoring of the patient's medical record and care plan (this is referred to as clinical information reconciliation in the CCHIT Framework). This also requires health IT interoperability standards that are easy to use and designed following rigorous and well-defined software engineering practices. Unfortunately, this has not always been the case, resulting in standards that are actually obstacles to interoperability as opposed to enablers of interoperability. Case Management tools used by Medical Homes (a concept popularized by the Patient-Centered Medical Home model) can greatly facilitate collaboration and Care Coordination.

Patients' access to and ownership of their electronic health records including the ability to edit, correct, and update their records. Patient portals can be used to increase patients' health literacy with health education resources. Decision aids comparing the benefits and harms of various interventions (Comparative Effectiveness Research) should be available to patients. Patients' health behavior change remains one of the greatest challenges in Healthcare Transformation. mHealth tools have demonstrated their ability to support Patient Activation.

Secure communication between patients and their providers. Patients should have the ability to specify with whom, for what purpose, and the kind of medical information they want to share. Patients should have access to an audit trail of all access events to their medical records just as consumers of financial services can obtain their credit record and determine who has inquired about their credit score.

Clinical Decision Support (CDS) as well as other forms of cognitive aids such as Electronic Checklists, Data Visualization, Order Sets, Infobuttons, and more advanced Clinical Question Answering (CQA) capabilities (see my previous post entitled Automated Clinical Question Answering: The Next Frontier in Healthcare Informatics). The unaided mind (as Dr. Lawrence Weed, the father of the Problem-Oriented Medical Record calls it) is no longer able to cope with the large amounts of data and knowledge required in clinical decision making today. CDS should be used to implement clinical practice guidelines (CPGs) and other forms of Evidence-Based Medicine (EBM).

However, the delivery of care should also take into account the unique clinical characteristics of individual patients (e.g., co-morbidities and social history) as well as their preferences, wishes, and values. Standardized Clinical Assessment And Management Plans (SCAMPs) promote care standardization while taking into account patient preferences and the professional judgment of the clinician. CDS should be well integrated with clinical workflows (see my my previous post entitled Addressing Challenges to the Adoption of Clinical Decision Support (CDS) Systems).

Predictive risk modeling to identity at-risk populations and provide them with pro-active care including early screening and prevention. For example, predictive risk modeling can help identify patients at risk of hospital re-admission, an important ACO quality measure.

Outcomes measurement with an emphasis on patient outcomes in addition to existing process measures. Examples of patient outcome measures include: mortality, functional status, and time to recovery.

Clinical Knowledge Management (CKM) to disseminate knowledge throughout the system in order to support a learning health system. The Institute of Medicine (IOM) released a report titled Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care. The report describes the learning health system as:

"delivery of best practice guidance at the point of choice, continuous learning and feedback in both health and health care, and seamless, ongoing communication among participants, all facilitated through the application of IT."

Applications of Human Factors research to enable the effective use of technology in clinical settings. Examples include: implementation of usability guidelines to reduce Alert Fatigue in Clinical Decision Support (CDS), Checklists, and Visual Analytics. There are many lessons to be learned from other mission-critical industries that have adopted automation. Following several incidents and accidents related to the introduction of the Glass Cockpit about 25 years ago, Human Factors training known as Cockpit Resource Management (CRM) is now standard practice in the aviation industry.

How I Make Technology Decisions

2013-04-28T19:35:00.001-04:00

The open source community has responded to the increasing complexity of software systems by creating many frameworks which are supposed to facilitate the work of developing software. Software developers spend a considerable amount of time researching, learning, and integrating these frameworks to build new software products. Selecting the wrong technology can cost an organization millions of dollars. In this post, I describe my approach to selecting these frameworks. I also discuss the frameworks that have made it to my software development toolbox.

Understanding the Business

The first step is to build a strong understanding of the following:

The business goals and challenges of the organization. For example, the healthcare industry is currently shifting to a value-based payment model in an increasingly tightening regulatory environment. Healthcare organizations are looking for a computing infrastructure that support new demands such as the Accountable Care Organization (ACO) model, patient-centered outcomes, patient engagement, care coordination, quality measures, bundled payments, and Patient-Centered Medical Homes (PCMH).

The intended buyers and users of the system and their concerns. For example, what are their pain points? which devices are they using? and what are their security and privacy concerns?

The standards and regulations of the industry.

The competitive landscape in the industry. To build a system that is relevant, it is important to have some ideas about the following: what is the competition? what are the current capabilities of their systems? what is on their road map? and what are customers saying about their products. This knowledge can help shape a Blue Ocean Strategy.

Emerging trends in technologies.

This type of knowledge comes with industry experience and a habit of continuously paying attention to these issues. For example, on a daily basis, I read industry news as well as scientific and technical publications. As a member of the American Medical Informatics Association (AMIA), I receive the latest issue of the Journal of the American Medical Informatics Association (JAMIA) which allows me to access cutting-edge research in medical informatics. I speak at industry conferences when possible and this allows me not only to hone my presentation skills, but also attend all sessions for free or at a discounted price. For the latest in software development, I turn to publications like InfoQ, DZone, and TechCrunch.

To better understand the users and their needs and concerns, I perform early usability testing (using sketches, wireframes, or mockups) to test design ideas and obtain feedback before actual development starts. For generating innovative design ideas, I recommend the following book: Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions by Bruce Hanington and Bella Martin.

Architecting the Solution

Armed with a solid understanding of the business and technological landscape as well as the domain, I can start creating a solution architecture. Software development projects can be chaotic. Based on my experience working on many software development projects across industries, I found that Domain Driven Design (DDD) can help foster a disciplined approach to software development. For more on my experience with DDD, see my previous post entitled How Not to Build A Big Ball of Mud, Part 2.

Frameworks evolve over time. So, I make sure that the architecture is framework-agnostic and focused on supporting the domain. This allows me to retrofit the system in the future with new frameworks as they emerge.

Due Diligence

Software development is a rapidly evolving field. I keep my eyes on the radar and try not to drink the vendors Kool-Aid. For example, not all vendors have a good track record in supporting standards, interoperability, and cross-platform solutions.

The ThoughtWorks Technology Radar is an excellent source of information and analysis on emerging trends in software. Its contributors include software thought leaders like Martin Fowler and Rebecca Parson. I also look at surveys of the developers community to determine the popularity, community size, and usage statistics of competing frameworks and tools. Sites like InfoQ often conduct these types of surveys like the recent InfoQ survey on Top JavaScript MVC Frameworks. I also like Matt Raible's Comparing JVM Web Frameworks.

I value the opinion of recognized experts in the field of interest. I read their books, blogs, and watch their presentations. Before formulating my own position, I make sure that I read expert opinions on opposing sides of the argument. For example, in deciding on a pure Java EE vs. Spring Framework approach, I read arguments by experts on both sides (experts like Arun Gupta, Java EE Evangelist at Oracle and Adrian Colyer, CTO at SpringSource).

Finally, consider a peer review of the architecture using a methodology like the Architecture Tradeoff Analysis Method (ATAM). Simply going through the exercise of explaining the architecture to stakeholders and receiving feedback can significantly help in improving it.

Rapid Prototyping

It's generally a good idea to create a rapid prototype to quickly learn and demonstrate the capabilities and value of the framework to the business. This can also generate excitement in the development team, particularly if the framework can enhance the productivity of developers and make their life easier.

The Frameworks I've Selected

The Spring Framework

I am a big fan of the Spring Framework. I believe it is really designed to support the need of developers from a productivity standpoint. In addition to dependency injection (DI), Aspect Oriented Programming (AOP), and Spring MVC, I like the Spring Data repository abstraction for JPA, MongoDB, Neo4J, and Hadoop. Spring supports Polyglot Persistence and Big Data today. I use Spring Roo for rapid application development and this allows me to focus on modeling the domain. I use the Roo scaffolding feature to generate a lot of Spring configuration and Java code for the domain, repository (Roo supports JPA and MongDB), service, and web layers (Roo supports Spring MVC, JSF, and GWT). Spring also support for unit and integration testing with the recent release of Spring MVC Test.

I use Spring Security which allows me to use AOP and annotations to secure methods and supports advanced features like Remenber Me and regular expressions for URLs. I think that JAAS is too low-level. Spring Security allows me to meet all OWASP Top Ten requirements (see my previous post entitled Application-Level Security in Health IT Systems: A Roadmap).

Spring Social makes it easy to connect a Spring application to social network sites like Facebook, Twitter, and LinkedIn using the OAuth2 protocol. From a tooling standpoint, Spring STS supports many Spring features and I can deploy directly to Cloud Foundry from Spring STS. I look forward to evaluating Grails and the Play Framework which use convention over configuration and are built on Groovy and Scala respectively.

Thymeleaf, Twitter Boostrap, and JQuery

I use Twitter Boostrap because it is based on HTML5, CSS3, JQuery, LESS, and also supports a Responsive Web Design (RWD) approach. The size of the components library and the community is quite impressive.

Thymeleaf is an HTML5 templating engine and a replacement for traditional JSP. It is well integrated with Spring MVC and supports a clear division of labor between back-end and front-end developers. Twitter Boostrap and Thymeleaf work well together.

AngularJS

For Single Page Applications (SPA) my definitive choice is AngularJS. It provides everything I need including a clean MVC pattern implementation, directives, view routing, Deep Linking (for bookmarking), dependency injection, two-way databinding, and BDD-style unit testing with Jasmine. AngularJS has its own dedicated debugging tool called Batarang. There are also several learning resources (including books) on AngularJS.

Check this page comparing the performance of AngulaJS vs. KnockoutJS. This is a survey of the popularity of Top JavaScript MVC Frameworks.

D3.js

D3.js is my favorite for data visualization in data-intensive applications. It is based on HTML5, SVG, and Javascript. For simple charting and plotting, I use jqPlot which is based on JQuery. See my previous post entitled Visual Analytics for Clinical Decision Making.

R

I use R for statistical computing, data analysis, and predictive analytics. See my previous post entitled Statistical Computing and Data Mining with R.

Development Tools

My development tools include: Git (Distributed Version Control), Maven or Gradle (build), Jenkins (Continuous Integration), Artifactory (Repository Manager), and Sonar (source code quality management). My testing toolkit includes Mockito, DBUnit, Cucumber JVM, JMeter, and Selenium.

Addressing Challenges to the Adoption of Clinical Decision Support (CDS) Systems: A Practical Approach

2013-04-14T23:29:00.000-04:00

Laptop and stethoscope by jfcherry is licensed under CC BY-SA 2.0

This post has been updated on February 15, 2015.

Despite its potential to improve the quality of care, CDS is not widely used in health care delivery today. In technology marketing parlance, CDS has not crossed the chasm. There are several issues that need to be addressed including:

Clinicians' acceptance of the concept of automated execution of evidence-based clinical practice guidelines.

Integration into clinical workflows and care protocols.

Usability and human factors issues including alert fatigue and the human factors that influence alert acceptance.

With the expanding use of clinical prediction models for diagnosis and prognosis, there is a need for a better understanding of the probabilistic approach to clinical decision making under uncertainty.

Standardization to enable the interoperability and reuse of CDS knowledge artifacts and executable clinical guidelines.

The challenges associated with the automatic concurrent execution of multiple clinical practice guidelines for patients with co-morbidities.

Integration with modern information retrieval capabilities which allow clinicians to access up-to-date biomedical literature. These capabilities includes text mining, Natural Language Processing (NLP), and more advanced Clinical Question Answering (CQA) tools. CQA allows clinicians to ask clinical questions in natural language and extracts answers from very large amounts of unstructured sources of medical knowledge. PubMed has more than 23 millions citations for biomedical literature from MEDLINE, life science journals, and online books. The typical Clinical Practice Guideline (CPG) is 50 to 150 pages long. It is impossible for the human brain to keep up with that amount of knowledge.

The use of mathematical simulations in CDS to explore and compare the outcomes of various treatment alternatives.

Integration of genomics to enable personalized medicine as the cost of whole-genome sequencing (WGS) continues to fall.

Integration of outcomes research in the context of a shift to a value-based healthcare delivery model. This can be achieved by incorporating the results of Comparative Effectiveness Research (CER) and Patient-Centered Outcome Research (PCOR) into CDS systems. Increasingly, outcomes research will be performed using observational studies (based on real world clinical data) which are recognized as complementary to randomized control trials (RCTs) for discovering what works and what doesn't work in practice. This is a form of Practice-Based Evidence (PBE) that is necessary to close the evidence loop.

Support for a shared decision making process which takes into account the values, goals, and wishes of the patient.

The use of Visual Analytics in CDS to facilitate analytical reasoning over very large amounts of structured and unstructured data sources.

Finally, the challenges associated with developing hybrid decision support systems which seamlessly integrate all the technologies mentioned above including: machine learning predictive algorithms, real-time data stream mining, visual analytics, ontology reasoning, and text mining.

In response to a paper titled Grand challenges in clinical decision support by Sittig et al. [1], Fox et al. [2] outlined four theoretical foundations for the design and implementation of CDS systems: decision theory, theories of knowledge representation, process design, and organizational modeling. The practical approach discussed in this post is grounded in those four theoretical foundations.

CDS Interoperability

The complexity and cost inherent in capturing the medical knowledge in clinical guidelines and translating that knowledge into executable code remain a major impediment to the widespread adoption of CDS software. Therefore, there is a need for standards for the interchange and reuse of CDS knowledge artifacts and executable clinical guidelines.

Different formalisms, methodologies, and architectures have been proposed over the years for representing the medical knowledge in clinical guidelines. Examples include but are not limited to the following:

The Arden Syntax
GLIF (Guideline Interchange Format)
GELLO (Guideline Expression Language Object-Oriented)
GEM (Guidelines Element Model)
The Web Ontology Language (OWL)
PROforma
EON
PRODIGY
Asbru
GUIDE
SAGE.

More recently, HL7 has published the Clinical Decision Support (CDS) Knowledge Artifact Specification which provides guidance on shareable CDS knowledge artifacts including event-condition-action rules, order sets, and documentation templates.

The HL7 Context-Aware Knowledge Retrieval (Infobutton) specifications provide a standard mechanism for clinical information systems to request context-specific clinical knowledge to satisfy clinicians and patients' information needs at the point of care.

Enabling the interoperability of executable clinical guidelines requires a standardized domain model for representing the medical information of patients and other contextual clinical information. The HL7 Virtual Medical Record (vMR) is a standardized domain model for representing the inputs and outputs of CDS systems. The ability to transform an HL7 CCDA document into an HL7 vMR document means that EHR systems that are Meaningful Use Stage 2 certified can consume these standard-compliant decision support services.

Because of the complexity and cost of developing CDS software, CDS software capabilities can be exposed as a set of services (part of a service-oriented architecture [16]) which can be consumed by other clinical systems such as EHR and Computerized Physician Order Entry (CPOE) systems. When deployed in the cloud, these CDS software services can be shared by several health care providers to reduce costs. The HL7 Decision Support Service (DSS) specification defines a REST and a SOAP web service interfaces using the vMR as message payload for accessing interoperable decision support services.

In practice, executable CDS rules (like other complex types of business rules) can be implemented with a production rule system using forward chaining. This is the approach taken by OpenCDS and some other large scale CDS implementations in real-world healthcare delivery settings. This allows CDS software developers to externalize the medical knowledge contained in clinical practice guidelines in the form of declarative rules as opposed to embedding that knowledge in procedural code. Many viable open source business rule management systems (BRMS) are available today and provide capabilities such as a rule authoring user interface, a rules repository, and a testing environment. Furthermore, a rule execution environment can be integrated with business processes, ontologies, and predictive analytics models (more on that later).

The W3C Rule Interchange Format (RIF) specification is a possible solution to the interchange of executable CDS rules. The RIF Production Rule Dialect (RIF PRD) is designed as a common XML serialization syntax for multiple rule languages to enable rule interchange between different BRMS. For example, RIF-PRD would allow the exchange of executable rules between existing BRMS like JBoss Drools, IBM ILOG JRules, and Jess. RIF is currently a W3C Recommendation and is backed by several BRMS vendors. Consentino et al. [3] described a model-driven approach for interoperability between IBM's proprietary ILOG rule language and RIF.

Seamless Integration into Clinical Workflows and Care Pathways

One of the main complaints against CDS systems is that they are not well integrated into clinical workflows and care protocols. Existing business process management standards like the Business Process Modeling Notation (BPMN) can provide a proven, practical, and adaptable approach to the integration of CDS rules and clinical pathways and protocols. Some existing open source and commercial BRMS already provide an integration of business rules and business processes out-of-the box and there are well-known patterns for integrating rules and processes [4, 5, 6] in business applications.

In 2014, the Object Management Group (OMG) released the Decision Model and Notation (DMN) specification which defines various constructs for modeling decision logic. The combination of BPMN and DMN [7, 8] provides a practical approach for modeling the decisions in clinical practice guidelines while integrating these decisions with clinical workflows. BPMN and DMN also support the modeling of decisions and processes that span functional and organizational boundaries.

Human Factors in the Use of Clinical Decision Support Systems

We need to do a better job in understanding the human factors that influence alert acceptance by clinicians in CDS. We also need clear and proven usability guidelines (backed by scientific research) that can be implemented by CDS software developers. Several research projects have sought to understand why clinicians accept or ignore alerts in medication-related CDS [9, 10]. Zacharia et al. [11] developed and validated an Instrument for Evaluating Human-Factors Principles in Medication-Related Decision Support Alerts (I-MeDeSA). I-MeDeSA measures CDS alerts on the following nine human factors principles: alarm philosophy, placement, visibility, prioritization, color learnability and confusability, text-based information, proximity of task components being displayed, and corrective actions.

The British National Health Service (NHS) Common User Interface (CUI) Program has created standards and guidance in support of the usability of clinical applications with inputs from user interface design specialists, usability experts, and hundreds of clinicians with a diversity of background in using health information technology. The program is based on a rigorous development process which includes: research, design, prototyping, review, usability testing, and patient safety assessment by clinicians. In the US, the National Institute of Standards and Technology (NIST) has also published some guidance on the usability of clinical applications.

Studies have also shown that like in aviation, checklists can provide cognitive support to clinicians in the decision making process.

Integrating Genomic Data with CDS

The costs of whole-genome sequencing (WGS) and whole-exome sequencing (WES) continue to fall. Increasingly, both WGS and WES will be used in clinical practice for inherited disease risk assessment and pharmacogenomic findings [21]. There is a need for a modern CDS architecture that can support and facilitate the introduction and use of WGS and WES in clinical practice.

In a paper titled Technical desiderata for the integration of genomic data with clinical decision support [14], Welch et al. proposed technical requirements for the integration of genomic data with clinical decision support. In another paper titled A proposed clinical decision support architecture capable of supporting whole genome sequence information [15], Welch et al. proposed the following clinical decision support architecture for supporting whole genome sequence information (click to enlarge):

Proposed service-oriented architecture (SOA) for whole genome sequence (WGS)-enabled CDS by Brandon M. Welch, Salvador Rodriguez Loya, Karen Eilbeck, and Kensaku Kawamoto is licensed under CC BY 3.0

The proposed architecture includes the following components: the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). The authors suggest separating the genome data from the EHR data.

Practice-Based Evidence (PBE) needed for closing the evidence loop

Prospective randomized controlled trials (RCTs) are still considered the gold standard in evidence-based medicine. Although they can control for biases, RCTs are costly, time consuming, and must be performed under carefully controlled conditions.

The retrospective analysis of existing clinical data sources is increasingly recognized as complementary to RCTs for discovering what works and what doesn't work in real world clinical practice [23]. These retrospective studies will allow the creation of clinical prediction models which can provide personalized absolute risk and treatment outcome predictions for patients. They also facilitate what has been referred to as "rapid learning health care." [24]

Toward Data-Driven Clinical Decision Support (CDS)

Williams Osler (1849-1919) [19] famously said that "Medicine is a science of uncertainty and an art of probability."

The use of clinical prediction models for diagnosis and prognosis is becoming common practice in clinical care. These models can predict the health risks of patients based on their individual health data. Clinical Prediction Models provide absolute risk and treatment outcome prediction for conditions such as diabetes, kidney disease, cancer, cardiovascular disease, and depression. These models are built with statistical learning techniques and introduce new challenges related to their probabilistic approach to clinical decision making under uncertainty [20]. In his book titled Super Crunchers: Why Thinking-By-Numbers Is The New Way To be Smart, Ian Ayres wrote:

Traditional experts make better decisions when they are provided with the results of statistical prediction. Those who cling to the authority of traditional experts tend to embrace the idea of combining the two forms of knowledge by giving the experts 'statistical support'. The purveyors of diagnostic software are careful to underscore that its purpose is only to provide support and suggestions. They want the ultimate decision and discretion to lie with the doctor. [12]

Furthermore, in order to leverage existing clinical domain knowledge from clinical practice guidelines and biomedical ontologies [22], machine learning algorithms' probabilistic approach to decision making under uncertainty must be complemented by technologies like production rule systems and ontology reasoners. Sesen et al. [18] designed a hybrid CDS for lung cancer care based on probabilistic reasoning with a Bayesian Network model and guideline-based recommendations using a domain ontology and an ontology reasoner.

Fox et al. [2] proposed an argumentation approach based on the construction, summarization, and prioritization of arguments for and against each generated candidate decision. These arguments can be both qualitative or quantitative in nature. On the importance of presenting evidence-based rationale in CDS systems, Fox et al. wrote:

In short, to improve usability of clinical user interfaces we advocate basing design around a firm theoretical understanding of the clinician’s perspective on the medical logic in a decision, the qualitative as well as quantitative aspects of the decision, and providing an evidence-based rationale for all recommendations offered by a CDS. [2]

In a paper titled A canonical theory of dynamic decision-making [13], Fox et al. proposed a canonical theory of dynamic decision-making and presented the PROforma clinical guideline modeling language as an instance of the canonical theory.

Clinical predictive model presentation techniques include traditional score charts, nomograms, and clinical rules [17]. However Clinical Prediction Models are easier to use and maintain when deployed as scoring services (part of a service-oriented software architecture) and integrated into CDS systems. The scoring service can be deployed in the cloud to allow integration with multiple client clinical systems [20]. The Predictive Model Markup Language (PMML) specification published by the Data Mining Group (DMG) supports the interoperable deployment of predictive models in heterogeneous software environments.

Visual Analytics or data visualization techniques can also play an important role in the effective presentation of Clinical Prediction Models to nonstatisticians particularly in the context of shared decision making.

Concurrent execution of multiple guidelines for patients with co-morbidities

According to the Medicare 2012 chart book, "over two-thirds of beneficiaries having two or more chronic conditions and 14% having 6 or more chronic conditions." [25]

In Grand Challenges in Clinical Decision Support [1], Sittig et al. wrote:

The challenge is to create mechanisms to identify and eliminate redundant, contraindicated, potentially discordant, or mutually exclusive guideline-based recommendations for patients presenting with co-morbid conditions.

Michalowski et al. [26] proposed a mitigation framework based on a first-order logic (FOL) approach.

A CDS Architecture for the era of Precision Medicine

I proposed a scalable CDS architecture for Precision Medicine in another post titled Toward a Reference Architecture for Intelligent Systems in Clinical Care.

References

[1] Sittig D, Wright A, Osheroff JA, Middletone B, Jteich J, Ash J, et al. Grand challenges in clinical Decision support. J Biomed Inform 2008;41(2):387–92.

[2] Fox, J., Glasspool, D.W., Patkar, V., Austin, M., Black, L., South, M., et al. (2010). Delivering clinical decision support services: there is nothing as practical as a good theory. J. Biomed. Inform. 43, 831–843

[3] Valerio Cosentino, Marcos Didonet del Fabro, Adil El Ghali. A model driven approach for bridging ILOG Rule Language and RIF. RuleML, Aug 2012, Montpellier, France.

[4] Mauricio Salatino. (Processes & Rules) OR (Rules & Processes) 1/X. http://salaboy.com/2012/07/19/processes-rules-or-rules-processes-1x/. Retrieved February 15, 2015.

[5] Mauricio Salatino. (Processes & Rules) OR (Rules & Processes) 2/X. http://salaboy.com/2012/07/28/processes-rules-or-rules-processes-2x/. Retrieved February 15, 2015.

[6] Mauricio Salatino. (Processes & Rules) OR (Rules & Processes) 3/X. http://salaboy.com/2012/07/29/processes-rules-or-rules-processes-3x/. Retrieved February 15, 2015.

[7] Sylvie Dan. Modeling Clinical Rules with the Decision Modeling and Notation (DMN) Specification. http://sylviedanba.blogspot.com/2014/05/modeling-clinical-rules-with-decision.html. Retrieved February 15, 2015.

[8] Dennis Andrzejewski, Eberhard Beck, Eberhard Beck, Laura Tetzlaff. The transparent representation of medical decision structures based on the example of breast cancer treatment. 9th International Conference on Health Informatics.

[9] Phansalkar S, Zachariah M, Seidling HM, Mendes C, Volk L, Bates DW. Evaluation of medication alerts in electronic health records for compliance with human factors principles. J Am Med Inform Assoc. 2014 Oct;21(e2):e332-40. doi: 10.1136/amiajnl-2013-002279.

[10] Seidling HM, Phansalkar S, Seger DL, et al. Factors influencing alert acceptance: a novel approach for predicting the success of clinical decision support. J Am Med Inform Assoc 2011;18:479–84.

[11] Zachariah M, Phansalkar S, Seidling HM, et al. Development and preliminary evidence for the validity of an instrument assessing implementation of human-factors principles in medication-related decision-support systems--I-MeDeSA. J Am Med Inform Assoc 2011;18(Suppl 1):i62–72.

[12] Ayres I. Super Crunchers: Why Thinking-By-Numbers Is The New Way To be Smart. Bantam.

[13] Fox J., Cooper R. P., Glasspool D. W. (2013). A canonical theory of dynamic decision-making. Front. Psychol. 4:150 10.3389/fpsyg.2013.00150.

[14] Welch, B.M.; Eilbeck, K.; Del Fiol, G.; Meyer, L.; Kawamoto, K. Technical desiderata for the integration of genomic data with clinical decision support. 2014,

[15] Welch BM, Loya SR, Eilbeck K, Kawamoto K. A proposed clinical decision support architecture capable of supporting whole genome sequence information. J Pers Med. 2014 Apr 4;4(2):176-99. doi: 10.3390/jpm4020176.

[16] Loya SR, Kawamoto K, Chatwin C, Huser V. Service oriented architecture for clinical decision support: a systematic review and future directions. J Med Syst. 2014 Dec;38(12):140. doi: 10.1007/s10916-014-0140-z.

[17] Ewout W. Steyerberg. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. New York: Springer, 2010.

[18] Sesen MB, Peake MD, Banares-Alcantara R, Tse D, Kadir T, Stanley R, Gleeson F, Brady M. 2014 Lung Cancer Assistant: a hybrid clinical decision support application for lung cancer care. J. R. Soc. Interface 11: 20140534. http://dx.doi.org/10.1098/rsif.2014.0534

[19] Bean RB, Bean, BW. Sir William Osler: aphorisms from his bedside teachings and writings. New York; 1950.

[20] Joel Amoussou. How good is your crystal ball?: Utility, Methodology, and Validity of Clinical Prediction Models. http://efasoft.blogspot.com/2015/01/how-good-is-your-crystal-ball-utility.html. Retrieved February 15, 2015.

[21] Dewey FE, Grove ME, Pan C, et al. Clinical Interpretation and Implications of Whole-Genome Sequencing. JAMA. 2014;311(10):1035-1045. doi:10.1001/jama.2014.1717.

[22] Joel Amoussou. Ontologies for Addiction and Mental Disease: Enabling Translational Research and Clinical Decision Support. http://efasoft.blogspot.com/2014/08/ontologies-for-addiction-and-mental.html. Retrieved February 2015.

[23] Future radiotherapy practice will be based on evidence from retrospective interrogation of linked clinical data sources rather than prospective randomized controlled clinical trials. Dekker, Andre L. A. J. and Gulliford, Sarah L. and Ebert, Martin A. and Orton, Colin G., Medical Physics, 41, 030601 (2014), DOI:http://dx.doi.org/10.1118/1.4832139

[24] Lambin, Philippe et al. 'Rapid Learning health care in oncology' – An approach towards decision support systems enabling customised radiotherapy. Radiotherapy and Oncology , Volume 109 , Issue 1 , 159 - 164.

[25] Centers for Medicare & Medicaid Services. Chronic Conditions Among Medicare Beneficiaries, Chartbook: 2012 Edition. http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Chronic-Conditions/Downloads/2012Chartbook.pdf. Accessed Feb. 15, 2015.

[26] Szymon Wilk, Martin Michalowski, Xing Tan, Wojtek Michalowski: Using First-Order Logic to Represent Clinical Practice Guidelines and to Mitigate Adverse Interactions. KR4HC@VSL 2014: 45-61.

Statistical Computing and Machine Learning with R

2013-03-24T22:40:00.000-04:00

The use of predictive risk models for personalized medicine is becoming a common practice in healthcare delivery. These models can predict the health risk of patients based on their individual health profiles. Examples include models for predicting breast cancer, stroke, cardiovascular disease, Alzheimer's disease, chronic kidney disease, diabetes, hypertension, and operative mortality for patients undergoing cardiac surgery. These predictive models are created through data analysis using statistical computing.

Predictive risk modeling can be used to identity at-risk populations and provide them with pro-active care including early screening and prevention. For example, predictive risk modeling can help identify patients at risk of hospital re-admission, an important Accountable Care Organization (ACO) quality measure.

Another important challenge in healthcare is to discover what works and what does not work in clinical practice. Comparative Effectiveness Research (CER), an emerging trend in Evidence Based Practice (EBP), has been defined by the Federal Coordinating Council for CER as "the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat, and monitor health conditions in 'real world' settings."

Despite their inherent methodological challenges (lack of randomization leading to possible bias and confounding), observational studies (using real world clinical data) are increasingly recognized as complementary to Randomized Control Trials (RCTs) and an important tool in clinical decision making and health policy.

Statistical Computing and Machine Learning are essential components of intelligent health IT systems. Over the last few years, the free and open source R Project for Statistical Computing has emerged as one the most popular tools for data analysis. This poll by kdnuggets.com shows the breakdown in popularity of various data mining and analytic tools.

R supports several Machine Learning algorithms including:

Nearest Neighbor
Naive Bayes
Decision Trees
Logistic Regression
Neural Networks
Support Vector Machines
Association Rules
k-Means Clustering

A technique called "Ensemble Methods" which consists in combining multiple models into one can be used to achieve a higher level of accuracy than its components. There are also R packages for niche methods like the Latent Class Causal Analysis (LCCA) Package for R. LCA is used in behavioral health research.

The following are very useful resources for doing statistical computing and data mining with R:

RStudio: an Integrated Development Environment (IDE) for R

ggplot2: statistical graphics and plotting system for R

sqldf: a package for manipulating R data frames using SQL

RMySQL: R interface to the MySQL database

RMongo: MongoDB Database interface for R

RHIPE: Big Data analysis using R and Hadoop. RHIPE stands for R and Hadoop Integrated Programming Environment. This approach is referred to as D&R (Divide and Recombine) Analysis of Large Complex Data (see this tech report on D&R from the RHIPE team)

RHadoop: Big Data analysis using R and Hadoop. This tool provides Hadoop MapReduce functionality in R

Rattle: A Graphical User Interface for Data Mining using R. This tool can export predictive models in Predictive Model Markup Language (PMML) format.

How Not to Build A Big Ball of Mud, Part 2

2013-03-10T19:58:00.000-04:00

In a previous post entitled How not to build a big ball of mud, I described the complexity of modern software systems and the challenges faced today by software developers and architects. Domain Driven Design (DDD) is a proven pattern language that can foster a disciplined approach to software development. DDD was first introduced by Eric Evans nine years ago in a seminal book entitled: Domain-Driven Design: Tackling Complexity in the Heart of Software. Over the last 9 years, a community of practice has emerged around DDD and many lessons have been learned in applying DDD to real world complex software development projects. During that time, software complexity has also increased significantly. Changes in the field of software development during the last few years include:

The proliferation of client devices which requires a Responsive Web Design (RWD) approach. RWD is made possible by open web standards like HTML5, CSS3, and Javascript which have displaced proprietary user interface technologies like Flex and Silverlight. RWD frameworks like Twitter Boostrap and Javascript Libraries like JQuery have become very popular with developers. The demands put on Javscript on the client side have created the need for Javascript MVC frameworks like AngularJS and EmberJS.

The importance of the user experience in a competitive online marketplace. Performing usability testing early in the software development life cycle (using wireframes or mockups) to test design ideas and obtain early feedback from future users is extremely valuable for creating the right solution. Metrics such as the System Usability Scale (SUS) can be used to assess the results of usability testing.

The prevalence of REST, JSON, OAuth2, and Web APIs for achieving web scale.

The emergence of Polyglot Persistence or the use of different persistence mechanisms such as relational, document, and graph databases within the same application. Developers are discovering that modeling data for NoSQL databases has many benefits, but also its own peculiarities.

The demands for quality and faster time-to-market have led to new techniques like test automation and continuous delivery.

The open source community has responded to these challenges by creating many frameworks which are supposed to facilitate the work of developing software. Software developers spend a considerable amount of time researching, learning, and integrating these various frameworks to build a system. Some of these frameworks can indeed be very helpful when used properly. However, DDD puts a big emphasis on understanding the domain. Here is what I learned from applying DDD over the last few years:

DDD is a significant intellectual investment, but with a potential for big rewards. To be successful in applying DDD, one must take the time to understand and digest the underlying principles, from the building blocks (entities, aggregates, value objects, modules, domain events, services, repositories, and factories) to the strategic aspects of applying DDD. For example, understanding the difference between an aggregate, a value object, and an entity is essential. Learning the right approach to designing aggregates is also very important as this can significantly impact transactions and performance. I highly recommend reading the recently published Implementing Domain Driven Design by Vaughn Vernon. The book provides a contemporary approach to applying DDD. For example, it covers important topics in applying DDD to modern software systems such as: sub-domains, domain events, event stores and event sourcing, rules for aggregate design, transactions, eventual consistency, REST, NoSQL, and enterprise application integration with concrete examples.

Proper application layering (user interface, application, domain, and infrastructure), understanding the responsibility of each layer (for example, an anemic domain model and a fat application layer are anti-pattern), and coding to interfaces. DDD is object-oriented (OO) design done right. The SOLID Principles of OO design are still applicable.

Determine if DDD is right for your project. Most of my work during the last few years has been in the healthcare domain. The HL7 CCDA and the Virtual Medical Record (vMR) define an information model for Electronic Healthcare Records (EHR) and Clinical Decision Support (CDS) systems respectively. Interoperability is an important and challenging issue in healthcare. DDD concepts such as "Strategic Design", "Context Map", "Bounded Context", and "Published Language" are very helpful in addressing and navigating this type of complexity.

As I mentioned earlier, DDD puts a big emphasis on understanding the domain. Developers applying DDD should be prepared to dedicate a considerable amount of time to learning about the domain, for example by collaborating and carefully listening to domain experts and by reading as much as they can about the domain. This is also the key to creating a rich domain model with behavior (as opposed to an anemic one). I found that simply reading industry standards and regulations is a great way to understand a domain. So understanding the domain is not just the responsibility of the Business Analyst. The code is the expression of the domain, so the coder needs to understand the domain in order to express it with code.

Some developers blame popular frameworks for encouraging anemic domain models. I found that a lack of understanding of the domain and its business rules is a major contributing factor to anemia in the domain model. A rule engine like Drools can help externalize these business rules in the form of declarative rules that can be maintained by domain experts through a DSL, spreadsheet, or web-based user interface.

There are opportunities in using recent ideas like Event Sourcing and the Command Query Responsibility Segregation (CQRS). These opportunities include: scalability, true audit trails, data mining, temporal queries, application integration. However, being pragmatic can help avoid unnecessary complexity.

I recommend exploring tools that are specifically designed to support a DDD or Model-Driven Development (MDD) approach. Apache Isis, Roma Meta Framework, Tynamo, and Naked Objects are examples of such tools. These tools can automatically generate all the layers of an application based on the specification of a domain model. By doing so, these tools allow you to really focus your time and attention on exploring and understanding the domain as opposed to framework and infrastructure concerns. For architects, these tools can serve as design pattern automation, constraining the development process to conform to DDD principles and patterns. I believe this is part of a larger trend in automating software development which also includes the essential practice of test automation. We software developers like to automate the job of other people. However, many tasks that we perform (including coding itself) are still very manual. Aspect-Oriented Programming (AOP) (using AspectJ for example) can also be used to enable this type of design pattern automation through compile-time weaving.

Check my previous post for 20 techniques for achieving software excellence.

State of the Semantic Web in the Clinical Domain

2013-02-24T20:48:00.001-05:00

In a previous post entitled Why Do We Need Ontologies in Healthcare Applications, I explained the important difference between ontologies, coding systems, and information models of data structures. I also outlined the benefits of using Semantic Web technologies like RDF, RDFS, OWL, SWRL, R2RML, SPARQL, SKOS, and Linked Open Data (LOD). These benefits include:

Reasoning and inferencing which are essential characteristics of intelligent Health IT Systems (iHIT)
Model consistency checking
Open World Assumption (OWA) and Non-Unique Naming Assumption enabling the integration of heterogeneous data sources and knowledge bases using Linked Open Data (LOD) principles. This integration can be accomplished by providing an RDF view over existing relational databases using R2RML (RDB to RDF Mapping Language) and by performing SPARQL federated queries. Intelligent queries can retrieve inferred facts using SPARQL 1.1 Entailment Regimes.
Linking to other biomedical ontologies like SNOMED and the Translational Medicine Ontology
Clinical Knowledge Management (CKM) using OWL to model and execute Clinical Practice Guidelines (CPGs) and Care Pathways (CPs).

Semantic Web in Clinical and Translational Research

The following are papers on how Semantic Web technologies are being used to realize these benefits in the healthcare domain:

A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data in the Journal of the American Medical Informatics Association (JAMIA)
Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank in the Journal of Biomedical Semantics
Ontology-based Modeling of Clinical Practice Guidelines: A Clinical Decision Support System for Breast Cancer Follow-up Interventions at Primary Care Settings

Apache Stanbol

I recently came across Apache Stanbol, a new Apache project which is described as "a set of reusable components for semantic content management". What I really like about Apache Stanbol is that it not only works on unstructured data sources, but also integrates a number of other popular Apache open source software which can be used to add a semantic layer to modern RESTful content-oriented applications. These components include:

Apache Tika for text and metadata extraction from a variety of commonly used document formats
Apache OpenNLP for natural language processing and named entity recognition (NER)
Apache Solr for document store and semantic search
Apache Jena as the RDF and Semantic Web framework.

Other open source components like Apache Mahout (a scalable Machine Learning library) can be integrated to provide document recommendation and clustering services.

The Content Enhancers in Stanbol can perform named entity recognition (NER) and link text annotations to external datasets such as DBPedia. In the clinical domain, these enhancers can be used to extract entities from medical records, journal articles, and clinical guidelines. These entities can then be linked to other clinical data sources such as drug and disease databases using Linked Data techniques.

Apache Stanbol also provides Reasoners based on Jena RDFS, OWL, and OWLMini Reasoners as well as the HermiT OWL Reasoner. These reasoners can perform consistency checking and classification. Stanbol supports Inference Rules in the following formats: SWRL, Jena Rules, and SPARQL (by converting Stanbol Rules into SPARQL CONSTRUCTs).

Automated Clinical Question Answering: The Next Frontier in Healthcare Informatics

2013-02-17T09:15:00.001-05:00

In a previous post, I predicted that 2013 will be the year I ntelligent Health IT Systems (iHIT) go mainstream. I based my prediction on a number of factors, notably the transformation of healthcare to a value-based delivery system driven by the latest scientific evidence (evidence-based practice and practice-based evidence).

Last week, IBM together with health insurer WellPoint Inc., and New York’s Memorial Sloan-Kettering Cancer Center announced the commercialization of Watson (the supercomputer which beat human champions in "Jeopardy!" on February 16, 2011) for question answering (QA) in the clinical domain. The following are some interesting facts released by IBM as part of this announcement:

The supercomputer has ingested 1,500 lung cancer cases from Sloan-Kettering records, plus 2 million pages of text from journals, textbooks and treatment guidelines. This is what I called Big Data in medicine.
In 2012, Watson became 240 percent faster and 75 percent smaller so it can run on a single server. No surprise here and I expect this trend to continue.

The following YouTube video entitled Oncology Diagnosis and Treatment explains how IBM envisions using Watson for Clinical Question Answering (CQA):

The User Experience in the Watson Demo

Clinical questions can be posed in natural language (spoken or typed in by the clinician using a keyboard).
The sources used for answering clinical questions include both structured (EMR databases) and unstructured information (journal articles, clinical guidelines, etc.).
Personalized medicine: the proposed interventions are driven by the data in the patient's medical record and the system can prompt the clinician for additional information on the patient if necessary. The displayed evidence and recommendations are updated to reflect changes in the patient's clinical data.
Human Factors: the clinician is always in the loop. She can ask Watson how it arrives at a specific care recommendation and can even remove a specific evidence (if deemed irrelevant or not appropriate).
The use of confidence scoring and evidence highlighting.
Patient-centeredness and shared decision making: the treatment plans take into account the values, goals, and wishes of the patient (patient preferences). Treatment options are discussed with the patient.
Comparative effectiveness is used to compare the benefits and harms of different interventions.
Information is displayed using data visualization (dashboard) to help meet key performance indicators in the context of a value-based payment model.

The Science Behind Watson

The real question is how do we make intelligent health IT systems like Watson widely available to all patients. A landmark report published by the Institute of Medicine in 2001 and titled Crossing the Quality Chasm - A New Health System for the 21st Century contained the following recommendation:

Patients should receive care based on the best available scientific knowledge. Care should not vary illogically from clinician to clinician or from place to place.

For the scientifically (and Artificial Intelligence) inclined, the following are some pointers on the science behind Watson:

The picture below represents a high level architecture of Watson (click on the image to enlarge it).

AskHermes and MiPACQ

IBM Watson is not the only effort to develop automated CQA capabilities. Some earlier CQA efforts used the PICO framework (Problem/Population, Intervention, Comparison, Outcome) to facilitate processing. More recent efforts have focused on the use of clinical questions posed in natural language.

AskHermes (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS) allows clinicians to enter questions in natural language and uses the following unstructured information sources: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines, and Wikipedia articles.

The processing pipeline in AskHermes includes the following: Question Analysis, Related Questions Extraction, Information Retrieval, Summarization and Answer Presentation. AskHermes performs question classification using MMTx (MetaMap Technology Transfer) to map keywords to UMLS concepts and semantic types. Classification is also achieved through supervised machine learning algorithms such as Support Vector Machine (SVM) and conditional random fields (CFRs). Summarization and answer presentation are based on clustering techniques.

MiPACQ (Multi-source Integrated Platform for Answering Clinical Questions) is based on Natural Language Processing (NLP) and Information Retrieval (IR) and utilizes data sources such as Electronic Medical Record (EMR) databases and online medical encyclopedia like Medpedia. MiPACQ uses a processing pipeline based on UIMA (Unstructured Information Management Architecture) and machine learning-based as well as rule-based scoring. NLP capabilities are provided by ClearTK and cTakes (clinical Text Analysis and Knowledge Extraction System).

The Road Ahead

Automated Clinical Question Answering (CQA) is really hard. However, that is the future of computing: intelligent machines we can have meaningful conversations with. CQA is a multidisciplinary field which combines disciplines like statistical computing, information retrieval, natural language processing, machine learning, rule engines, semantic web technologies, knowledge representation and reasoning, visual analytics, and massively parallel computing. There are several open source projects that provide the building blocks. Many EHR software today are glorified data entry systems. We need to move to the next level and that will require technical leadership.

Patient Privacy At Web Scale

2013-02-03T20:27:00.000-05:00

A study entitled Patients want granular privacy control over health information in electronic medical records by Kelly Caine and Rima Hanania in the current issue of the Journal of the American Medical Informatics Association (JAMIA) clearly indicates that patients want a granular level of control over the sharing of their medical information. Patients also want to control with whom their health information is shared and for what purpose. The study looks at how the presence of sensitive health information in a medical record affects patient privacy preferences. In this post, I discuss how current and emerging standards can be used to enforce patient privacy preferences at web scale.

First, I think the key to achieving patient privacy at web scale is to adopt proven light-weight protocols and standards such as REST, JSON, OAuth2, and OpenID Connect. The RESTful Health Exchange (RHEx) project funded by the Federal Health Archicture (FHA) was a step in the right direction. These protocols have also been embraced by large internet identity providers like Google, Facebook, and Microsoft. To increase the strength of authentication when using these existing online identities in patient-facing healthcare applications, techniques like multi-factor authentication (e.g., two-factor authentication using the user's phone) and adaptive risk authentication can be used. These light-weight standards and protocols contrast with enterprise-centric alternatives like SOAP and SAML which are the foundation for Integrating the Health Enterprise (IHE) standards including XDS.b, XDR, and XUA.

An emerging approach that could really help put patients in control of the privacy of their electronic medical record is the User-Managed Access (UMA) Protocol of the Kantara Initiative. According to the UMA Core specification:

User-Managed Access (UMA) is a profile of OAuth 2.0. UMA defines how resource owners can control protected-resource access by clients operated by arbitrary requesting parties, where the resources reside on any number of resource servers, and where a centralized authorization server governs access based on resource owner policy.

That sounds a lot like a healthcare environment where a typical patient has her health information residing in the Electronic Health Record (EHR) systems of multiple healthcare providers. A frequent use case is when the patient's health information is shared among providers during primary care physicians' referrals to specialist outpatient clinics. The following are the benefits for the patient privacy of a centralized authorization server as defined in UMA:

The ability to manage her consent directives (scope of access in UMA parlance) from a central location (ideally in the cloud) as opposed to the current paper-based environment where the patient signs a consent form for each provider and has no visibility into how the consent is being used and enforced.
It facilitates the update and revocation of the consent directives by the patient.
It would give the patient a full audit trail of requests and access events related to her health information.
The patient user experience of managing their privacy preferences online can be significantly enhanced by data visualization. A study titled Exploring Visualization Techniques to Enhance Privacy Control UX for User-Managed Access introduced the notion of a "UMA Connection" for helping users visualize the context of a data sharing policy (e.g., contacts, allowed actions, access restrictions, and trusted claims).

In UMA, trusted claims (e.g., information about a requesting healthcare provider such as email, name, role, organization, and NPI) can be conveyed using OpenID Connect. The Google OpenID Connect Demo provides a step by step guide to OpenID Connect and Nat Sakimara's Dummy’s guide for the Difference between OAuth Authentication and OpenID is a good explanation of how OpenID Connect complements OAuth2. A separate specification entitled Binding Obligations on User-Managed Access (UMA) Participants proposes a legal framework that defines the obligations of parties that operate and use UMA-conforming software programs and services.

A recent post by Domenico Catalono entitled UMA Approach to Protect and Control Online Reputation describes a UMA-based approach for supporting privacy based on reputation and trust. An example in the post is a "global reputation ranking" in the context of an online e-commerce site. In the context of healthcare privacy, when deciding to share their sensitive medical information with a specific healthcare provider, the same concept could be used to display the number and severity of security breaches experienced by the healthcare provider in the past. Section 13402(e)(4) of the HITECH Act actually requires posting a list of breaches of unsecured protected health information affecting 500 or more individuals. The list is available here.

The recently approved XACML 3.0 standard is a powerful mechanism for expressing and evaluating privacy policies. It provides capabilities such as obligation and advice expressions as well as delegation of authorization. In this presentation, Eve Maler discusses possible integration points between UMA and XACML. The REST Profile of XACML 3.0 and the Request/Response Interface based on JSON and HTTP for XACML 3.0 proposals introduce the notion of "RESTful Authorization-as-a-Service (AZaaS)" which can facilitate the use of XACML in a UMA-based access control environment.

Application-Level Security in Health IT Systems: A Roadmap

2013-01-20T20:35:00.000-05:00

An investigative report titled "Health-care sector vulnerable to hackers, researchers say" published last month in the Washington Post on the state of cybersecurity reveals that:

"...health care is among the most vulnerable industries in the country, in part because it lags behind in addressing known problems."

When it comes to application-level security, the healthcare industry is indeed lagging when compared to other industries that handle consumer sensitive information. The Payment Card Industry Data Security Standard (PCI DSS) is an information security standard for organizations that handle cardholder information. The PCI DSS certification includes requirements for security code reviews, penetration testing, and compliance validation by an external Qualified Security Assessor (QSA).

This week, the Department of Health and Human Services (HHS) issued a final omnibus rule on the HIPAA Privacy, Security, Enforcement, and Breach Notification Rules. The rules impose the following:

Increased and tiered civil money penalty structure for security breaches depending on "reasonable diligence", "willful neglect", and "timely correction". The penalty amount varies from $100 to $50,000 per violation with a maximum penalty of $1.5 million annually for all violations of an identical provision.
Expansion of accountability and liability for Business Associates (BAs) and subcontractors.
Increased privacy protections under the Genetic Information Nondiscrimination Act (GINA).

Furthermore, the Security and Privacy Tiger Team of the US Office of the National Coordinator (ONC) for health IT released a set of recommendations related to the Meaningful Use (MU) Stage 2 requirements for patients access to health record portals. The need for patient engagement as a prerequisite to a successful transformation of healthcare means that particular attention should be paid to the security needs of consumer-facing web applications.

Security in the Software Development Life Cycle (SDLC)

Unfortunately, security as a non-functional requirement, is often relegated to an afterthought in the software development life cycle (SDLC). As an afterthought, security is added to the software later or at the end of the development cycle. At that point, adding adequate security is difficult and costly, requiring significant rework. In some cases, penetration testing is not performed at all before the application is deployed into production.

This situation can be exacerbated by an interpretation of the Agile methodology that puts the emphasis on the early and frequent demonstrations to the customer of functional (as opposed to non-functional) features of the system under development.

Another issue is that developers and architects often over-rely on 3rd-party security infrastructure, as opposed to (1) developing a Threat Model for the application they are building and (2) creating a security implementation approach to address the Threat Model. 3rd-party security infrastructure can be helpful, but should serve the security implementation strategy as opposed to driving it. As Bruce Schneier, a well-known cryptographer and computer security specialist said in an article titled "Computer Security: Will We Ever Learn?":

"Security is a process, not a product. Products provide some protection, but the only way to effectively do business in an insecure world is to put processes in place that recognize the inherent insecurity in the products. The trick is to reduce your risk of exposure regardless of the products or patches."

Understanding Potential Security Vulnerabilities

Application Security is a mature discipline. Developers and architects should build a deep understanding of web application security vulnerabilities as opposed to completely relying on 3rd-party security infrastructure for addressing security concerns. The following are well documented bodies of knowledge on security vulnerabilities:

The OWASP Top 10 Web Application Security Risks (cheat sheets explaining each of those vulnerabilities and how to address them are available on the OWASP web site):

A1: Injection
A2: Cross-Site Scripting (XSS)
A3: Broken Authentication and Session Management
A4: Insecure Direct Object References
A5: Cross-Site Request Forgery (CSRF)
A6: Security Misconfiguration
A7: Insecure Cryptographic Storage
A8: Failure to Restrict URL Access
A9: Insufficient Transport Layer Protection
A10: Unvalidated Redirects and Forwards.
The CWE/SANS Top 25 Most Dangerous Software Errors, the result of collaboration between the SANS Institute, MITRE, and many top software security experts in the US and Europe.
Programming language-specific vulnerabilities such as those listed in the Cert Oracle Secure Coding Standard for Java.
Well-documented security vulnerabilities introduced by the use of 3rd-party open source application development frameworks.
The National Vulnerability Database
The Common Weakness Enumeration (CWE) which is currently maintained by the MITRE Corporation with support from the National Cyber Security Division (DHS). The diagram below from the CWE web site shows a portion of the CWE hierarchical structure. Click on the image below to enlarge it.
Obviously, developers should be on the lookout for new and emerging threats to web application security.

Application Threat Modelling

Armed with a deep understanding of potential vulnerabilities, developers and architects can build a Security Policy (who has what type of access to which resource in the system) and a Threat Model including:

An analysis of the attack surface of the application.
Identification of potential threats and attackers (both inside and outside the organization and its business associates and subcontractors) and their characteristics, tactics, and motivations. A threat categorization methodology such as STRIDE can be used. STRIDE defines the following threat categories: Spoofing of user identity, Tampering, Repudiation, Information disclosure (privacy breach or Data leak), Denial of Service (D.o.S.), and Elevation of privilege.
The consequences of those attacks for patients and the healthcare organization serving them.
Countermeasures and a risk mitigation strategy. The Application Security Frame (ASF) defines the following categories of countermeasures: Authentication, Authorization, Configuration Management, Data Protection in Storage and Transit, Data Validation/Parameter Validation, Error Handling and Exception Management, User and Session Management, Auditing and Logging.
How the deployment environment will impact privacy and security. NIST and the Cloud Security Alliance (CSA) provide specific security guidance for cloud deployment.
New software architectures like the Single Page Application (SPA) approach present new challenges in securing web applications. Single Page Applications are subject to common web application vulnerabilities like Cookie Snooping, Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), and JSON Injection. Security is mainly the responsibility of the server, although client-side frameworks like AngularJS also provide some features to enhance the security of Single Page Applications.

Developing a Security Implementation Strategy

To address the issues of secure software development in the context of Agile, the Software Assurance Forum for Excellence in Code (SAFECode) published a guide titled "Practical Security Stories and Tasks for Agile Development Environment".

Secure Coding Standards, Static Analysis, and Security Code Review

Many developers are aware of coding conventions (such as the Code Conventions for the Java Programming Language), and the benefits of peer code reviews and static code analysis (using tools like Checkstyle, PMD, FindBugs, and Sonar). These practices should be expanded to cover secure coding as well. The following resources can help:

The Cert Oracle Secure Coding Standard for Java.
The OWASP Code Review Guide.
The "Fundamental Practices for Secure Software Development: A Guide to the Most Effective Secure Development Practices in Use Today" published by the Software Assurance Forum for Excellence in Code (SAFECode)
The Payment Card Industry Data Security Standard (PCI DSS) "Information Supplement: Requirement 6.6 Code Reviews and Application Firewalls Clarified" is an example of secure code review requirements in an industry vertical.

There are secure code static analysis tools that can be particularly useful when used in combination with a secure code review process. If possible, the static code analysis should be integrated into the build and continuous integration process to provide specific secure code metrics as well as the evolution of those metrics over time.

Penetration Testing

Finally, the application should go through penetration testing before it is deployed into production. Application-level penetration testing should be done in addition to network-level penetration testing. OWASP provides a detailed Testing Guide and a number of open source and commercial penetration testing tools are available as well.

Visual Analytics for Clinical Decision Making

2013-01-13T21:13:00.002-05:00

In my last post, I talked about the era of Big Data in medicine, Evidence-Based Practice (EBP), Practice-Based Evidence (PBE), and the need for a human-centered approach to building intelligent health IT (iHIT) systems. In this post, I discuss Visual Analytics, an emerging discipline in Data Science. In a report titled "Illuminating the Path: The R&D Agenda for Visual Analytics" published in 2004 by the National Visualization and Analytics Center (NVAC), Visual Analytics is defined as "the science of analytical reasoning facilitated by visual interactive interfaces."

The goal of Visual Analytics is to obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. As a multidisciplinary field, Visual Analytics combines several disciplines such as human perception and cognition, interactive graphic design, statistical computing, data mining, spatio-temporal data analysis, and even art.

In his book titled "Beautiful Evidence", Edward Tufte illustrates the fundamental principles of analytical design by using Charles Minard's famous map known as "Carte figurative des pertes successives en hommes de l'Armée Française dans la campagne de Russie 1812-1813" (Figurative Map of the successive losses in men of the French Army in the Russian Campaign 1812-1813). The map is a dramatic account of the heavy losses of the french army during Napoleon's Russian campaign of 1812. Edward Tuffe calls the map the "best statistical graphics ever". Click on the image below to enlarge it.

Visual Analytics is also an emerging discipline in healthcare informatics. For example, similar to Minard's map of the Russian Campaign of 1812-1813, Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes over a certain period of time through the vivid showing of causes, variables, comparisons, and explanations. This approach contrasts with the traditional display of clinical data in table rows that is so common in electronic health record (EHR) systems interfaces.

Another Visual Analytics technique called Visual Cluster Analysis can be particularly helpful in Comparative Effectiveness in clinical care settings where the goal is to compare the benefits and harms of different interventions for different subgroups (groups of patients sharing similar clinical characteristics such as age, gender, race, genetic profile, and comorbidities). Given a specific patient, Visual Cluster Analysis can help the clinician visually explore what works and what doesn't work for "similar patients".

You can find interesting examples of research projects and implementations in the proceedings of the Visual Analytics in Healthcare Workshop which has been held in conjunction with the IEEE VisWeek for the past three years. The 2013 Visual Analytics in Healthcare Summit (VHAC 2013) will be held in conjunction with the AMIA 2013 conference in Washington DC. There are a number of open source toolkits that can be used to implement Visual Analytics. Some of them are based on open web standards such as HTML5, CSS3, SVG, and Javascript. My favorite is D3.js. DC.js and Crossfilter are built on top of D3.js and facilitate the creation of interactive visualization of multivariate datasets in the browser.

Prediction for 2013: Intelligent Health IT Systems (iHIT) Go Mainstream

2012-12-30T12:20:00.000-05:00

iHIT systems represent an evolution of clinical decision support (CDS) systems. Traditionally, CDS systems have provided functionalities such as Alerts and Reminders, Order Sets, Infobuttons, and Documentation Templates. iHIT systems go beyond these basic functionalities and are poised to go mainstream in 2013. This evolution is enabled by recent developments in both computing and healthcare. Notably in computing:

The emergence of Big Data and massively parallel computing platforms like Hadoop.
The entrance of the following disciplines into the mainstream of computing: Machine Learning (a branch of Artificial Intelligence), Statistical Computing, Visual Analytics, Natural Language Processing, Information Retrieval, Rule engines, and Semantic Web Technologies (like RDF, OWL, SPARQL, and SWRL). These disciplines have been around for many years, but have been largely confined into Academia, very large organizations, and niche markets.
The availability of open source tools, platforms, and resources to support the technologies mentioned above. Examples include: R (a statistical engine), Apache Hadoop, Apache Mahout, Apache Jena, Apache Stanbol, Apache OpenNLP, and Apache UIMA. The number of books, courses, and conferences dedicated to these topics has increased dramatically over the last two years signalling an entrance into the mainstream.

In addition, the healthcare industry itself is currently going through a significant transformation from a business model based on the number of patients treated to a value-based payment model. The Accountable Care Organization (ACO) is an example of this new model. This model puts an increased emphasis on meeting certain quality and performance metrics driven by the latest scientific evidence (this is called Evidence Based Practice or EBP).

Although very costly, Randomized Control Trials (RCTs) are considered the strongest form of evidence in EBP. Despite their inherent methodological challenges (lack of randomization leading to possible bias and confounding), observational studies (using real world data) are increasingly recognized as complementary to RCTs and an important tool in clinical decision making and health policy. According to a report titled "Clinical Practice Guidelines (CPGs) We Can Trust" published by the Institute Of Medicine (IoM):

"Randomized trials commonly have an under representation of important subgroups, including those with comorbidities, older persons, racial and ethnic minorities, and low-income, less educated, or low-literacy patients."

Investments into Comparative Effectiveness Research (CER) are increasing as well. CER, an emerging trend in Evidence Based Practice (EBP), has been defined by the Federal Coordinating Council for CER as "the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat and monitor health conditions in 'real world' settings." CER is important not only for discovering what works and what doesn't work in practice, but also for an informed shared decision making process between the patient and her provider.

The use of predictive risk models for personalized medicine is becoming a common practice. These models can predict the health risks of patients based on their individual health profiles (including genetic profiles). These models often take the form of logistic regression models. Examples include models for predicting cardiovascular disease, ICU mortality, and hospital readmission (an important ACO performance measure).

Thanks to the Meaningful Use incentive program, adoption of electronic health record (EHR) systems by providers is rapidly increasing. This translates into the availability of huge amount of EHR data which can be harvested to provide Practice Based Evidence (PBE) necessary to close the evidence loop. PBE is the key to a learning health system. The Institute of Medicine (IOM) released a report last year titled "Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care". The report describes a learning health system as:

"...delivery of best practice guidance at the point of choice, continuous learning and feedback in both health and health care, and seamless, ongoing communication among participants, all facilitated through the application of IT."

Both EBP and PBE will require not only rigorous scientific methodologies, but also a computing platform suitable for the era of Big Data in medicine. As Williams Osler (1849-1919) famously said:

"Medicine is a science of uncertainty and an art of probability."

Lastly, to be successful, the emergence of iHIT systems will require a human-centered design approach. This will be facilitated by the use of techniques that can enhance human cognitive abilities. Examples are: Electronic Checklists (an approach that originates from the aviation industry and has been proven to save lives in healthcare delivery as well) and Visual Analytics.

Happy New Year to You and Your Family!

A Journey into Software Excellence

2012-12-08T19:38:00.000-05:00

I am back in the blogosphere after a seven month hiatus. It was about time I get my blogging act together. Software development has never been so much fun. In this post, I will share some thoughts on using tools, methods, and practices that can really help in your search for software excellence from the initial prototyping of the user interface to deployment.

With the rapid proliferation of mobile and desktop devices, adopt a Responsive Wed Design (RWD) strategy to reach the largest audience possible.
Create responsive sketches, wireframes, or mockups and apply usability guidelines during the initial prototyping. The NHS Common User Interface (CUI) Program is a good example of usability guidelines for healthcare IT applications. Usability.gov also has many interesting resources as well.
Perform usability testing to test your design ideas and obtain early feedback from future users of your product before actual development starts. Use metrics such as the System Usability Scale (SUS) to assess the results.
Carefuly select the right HTML5, CSS3, and Javascript libraries and frameworks. The Single Page Application (SPA) architecture is becoming popular and can provide a more fluid user experience.
Consider "Specification By Example" and Behaviour Driven Development (BDD) tools like Cucumber-JVM to create executable user stories.
Pattern languages like Domain Driven Design (DDD) can help you avoid a "Big Ball of Mud" in architecting your software. DDD concepts such as "Strategic Design", "Bounded Context", "Published Language", and "Anti-Corruption Layer" can help you put your architecture in the right perspective, particularly if there is a need to support industry interoperability standards such as HL7 and IHE. However, beware that the practice of DDD has evolved over the last 8 years and new lessons have been learned particularly in the area of "Aggregate" design. So keep up-to-date with new developments in the field in order to leverage the experience of the community. I also found the concept of "Hexagonal Architecture" very helpful in visualizing the complexity of an architecture from different angles.
Consider a peer review of the architecture using a methodology like the Architecture Tradeoff Analysis Method (ATAM).
Embrace Polyglot Persistence (the use of different persistence mechanisms such as relational, document, and graph databases within the same application). However, use the right application development framework to make this easy. Beware of the peculiarities of modeling data for NoSQL databases and remember that "Persistence Ignorance" is not always easy to achieve in practice.
Add a social dimension to your product by integrating the user experience with existing social networking sites that your users already belong to.
Make your application more intelligent through the use of techniques such as Machine Learning (e.g., a recommendation engine), ontologies and rule engines (e.g., automated reasoning), and Natural Language Processing (NLP) (e.g., automated question answering). As Richard Hamming said: "The purpose of computing is insight, not numbers".
To enhance the user experience, adopt HTML5, SVG, and Javascript-based graphing and data visualization techniques for data-intensive applications.
Consider the benefits of deploying the application to the cloud and if you decide to deploy to the cloud, factor that into your entire design and development process including the selection of development tools. Choosing the right Platform-as-a-Service (PaaS) provider can facilitate the process.
Create a Continuous Delivery pipeline based on the core concept of automated testing. Leverage tools like Git (Distributed Version Control), Gradle (build), Jenkins (Continuous Integration), and Artifactory. Continuous Delivery allows you to go to market faster and with confidence in the quality of your product. Save infrastructure costs by using these tools in the cloud during development.
Although there is still a place for manual testing, all tests should be automated as much as possible. In addition to the traditional unit tests (using tools like JUnit, TestNG, and Mockito), embrace automated cross-device, cross-browser, and cross-platform user interface (UI) testing using a tool like Selenium.
Web services and performance testing should also become part of your build and Continuous Delivery pipeline using tools like soapUI and JMeter respectively. Performance testing should not be an afterthought.
Adopt automated code quality inspection with tools like Sonar, Checkstyle, FindBugs, and PMD. This can supplement your peer code review process and can provide you with concrete code quality metrics in addition to automatically flagging bugs (including insecure code) in your code base.
Write secure code by carefully studying the OWASP Top Ten. Adopt OWASP guidelines related to security testing and secure code reviews. Perform penetration testing to find vulnerabilities in your application before it is too late.
Do your due diligence in protecting the privacy of your users data. Put the users in control of their privacy in your system by adopting standards such as OAuth2, OpenID Connect, and the User Managed Access (UMA) protocol of the Kantara Initiative. Consider increasing the strength of authentication using multi-factor authentication (e.g., two-factor authentication using the user's phone).
Invest in learning and training your development team. Software excellence can only be achieved by skilled professionals.
Relax, have fun, and remember that software excellence is a journey.

How to Add Arbitrary Metadata to Any Element of an HL7 CDA Document

2012-05-05T17:06:00.000-04:00

There has been a lot of buzz lately about metadata tagging in the health IT community. In this blog, I describe an approach to annotating HL7 CDA documents (or any other XML documents) without actually editing the document that is being annotated. Metadata tagging is just an example of annotation. The underlying principle of this approach is that Anyone can say Anything about Anything (the AAA slogan) which is well know in the Semantic Web community. In other words, anyone (e.g., patient, care giver, physician, provider organization) should have the ability to add arbitrary metadata to any element of a CDA document. For the sake of "Separation of Concerns" which is a fundamental principle in software engineering, the metadata should be kept out of the CDA document. The benefits of keeping the metadata or annotations out of the CDA document include:

Reuse of the same metadata by distinct elements from potentially multiple clinical documents.
The ability to update the metadata without affecting the target CDA documents.
The ability for any individual, organization, or community of interest (e.g., privacy or medical device manufacturers) to create a metadata vocabulary without going through the process of modifying the normative CDA specification (or one of its offsprings like the CCD, the C32, or the Consolidated CDA) or the XDS metadata specifications.

History and Current Status of Metadata Standards in Health IT

The CDA specification defines some metadata in the header of a CDA document. In addition, the XD* family of specifications (XDS, XDR, and XDM) also defines a comprehensive set of metadata to be used in cross enterprise document exchange. NIEM is being used currently in several health IT projects. In a previous post titled "Toward a Universal Exchange Language for Healthcare", I described how the NIEM metadata approach could be adapted to the healthcare domain.

The President's Council of Advisors on Science and Technology (PCAST) published a report in December 2010 entitled: "Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward". To describe the proposed approach to metadata tagging, the report provides an example based on the exchange of mammograms:

"The physician would be able to securely search for, retrieve, and display these privacy protected data elements in much the way that web surfers retrieve results from a search engine when they type in a simple query.

What enables this result is the metadata attached to each of these data elements (mammograms), which would include (i) enough identifying information about the patient to allow the data to be located (not necessarily a universal patient identifier), (ii) privacy protection information-who may access the mammograms, either identified or deidentified, and for what purposes, (iii) the provenance of the data-the date, time, type of equipment used, personnel (physician, nurse, or technician), and so forth."

The HIT Standards Committee (HITSC) Metadata Tiger Team made specific recommendations to the ONC in June 2011. These recommendations included the use of:

Policy Pointers: URLs that point to external policy documents affecting the tagged data element.
Content Metadata: the actual metadata with datatype (clinical category) and sensitivity (e.g., substance abuse and mental health).
Use of the HL7 CDA R2 with headers.

Based on those recommendations, the ONC published a Notice of Proposed Rule Making (NPRM) in August 2011 to receive comments on proposed metadata standards.

The Data Segmentation Working Group of the ONC Standards and Interoperability Framework is currently working on metadata tagging for compliance with privacy policies and consent directives.

The Annotea Protocol

The capability to add arbitrary metadata to documents without modifying them has been available in the Semantic Web for at least a decade. Indeed, it is hard to talk about metadata without a reference to the Semantic Web. I will use the W3C Annotea Protocol (which is implemented by the Amaya open source project) to demonstrate this capability. I will also show that this approach does not require the use of the Resource Description Framework (RDF) format and related Semantic Web technologies like OWL and SPARQL. The approach can be adapted to alternative representation formats such as XML, JSON, or the Atom syndication format. Let's assume that I need to add metadata tags to the CDA document below. The CDA document has only one problem entry for substance abuse disorder (SNOMED CT code 66214007) and my goal is to attach privacy metatada prohibiting the disclosure of that information (the most relevant elements are highlighted in red):

<ClinicalDocument>
.....
<component>
<structuredBody>
<component>

<section>
<templateId root="2.16.840.1.113883.3.88.11.83.103"
   assigningAuthorityName="HITSP/C83"/>
<templateId root="1.3.6.1.4.1.19376.1.5.3.1.3.6"
   assigningAuthorityName="IHE PCC"/>
<templateId root="2.16.840.1.113883.10.20.1.11" assigningAuthorityName="HL7 CCD"/>

<code code="11450-4" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC"
   displayName="Problem list"/>
<title>Problems</title>
<text>...</text>
<entry typeCode="DRIV">
<act classCode="ACT" moodCode="EVN">
   <templateId root="2.16.840.1.113883.3.88.11.83.7"
       assigningAuthorityName="HITSP C83"/>
   <templateId root="2.16.840.1.113883.10.20.1.27"
       assigningAuthorityName="CCD"/>
   <templateId root="1.3.6.1.4.1.19376.1.5.3.1.4.5.1"
       assigningAuthorityName="IHE PCC"/>
   <templateId root="1.3.6.1.4.1.19376.1.5.3.1.4.5.2"
       assigningAuthorityName="IHE PCC"/>
   
   <id root="6a2fa88d-4174-4909-aece-db44b60a3abb"/>
   <code nullFlavor="NA"/>
   <statusCode code="completed"/>
   <effectiveTime>
       <low value="1950"/>
       <high nullFlavor="UNK"/>
   </effectiveTime>
   <performer typeCode="PRF">
       <assignedEntity>
           <id extension="PseudoMD-2" root="2.16.840.1.113883.3.72.5.2"/>
           <addr/>
           <telecom/>
       </assignedEntity>
   </performer>
   <entryRelationship typeCode="SUBJ" inversionInd="false">
       <observation classCode="OBS" moodCode="EVN">
           <templateId root="2.16.840.1.113883.10.20.1.28"
               assigningAuthorityName="CCD"/>
           <templateId root="1.3.6.1.4.1.19376.1.5.3.1.4.5"
               assigningAuthorityName="IHE PCC"/>
           
           <id root="d11275e7-67ae-11db-bd13-0800200c9a66"/>
           <code code="64572001" displayName="Condition"
               codeSystem="2.16.840.1.113883.6.96"
               codeSystemName="SNOMED-CT"/>
           <text>
               <reference value="#PROBSUMMARY_1"/>
           </text>
           <statusCode code="completed"/>
           <effectiveTime>
               <low value="1950"/>
           </effectiveTime>
           <value displayName="Substance Abuse Disorder" code="66214007" codeSystemName="SNOMED" codeSystem="2.16.840.1.113883.6.96"/>
           <entryRelationship typeCode="REFR">
               <observation classCode="OBS" moodCode="EVN">
                   <templateId root="2.16.840.1.113883.10.20.1.50"/>
                   
                   <code code="33999-4" codeSystem="2.16.840.1.113883.6.1"
                       displayName="Status"/>
                   <statusCode code="completed"/>
                   <value code="55561003"
                       codeSystem="2.16.840.1.113883.6.96"
                       displayName="Active">
                       <originalText>
                       <reference value="#PROBSTATUS_1"/>
                       </originalText>
                   </value>
               </observation>
           </entryRelationship>
       </observation>
   </entryRelationship>
</act>
</entry>
</section>
</component>
</structuredBody>
</component>
</ClinicalDocument>

The following is a separate annotation document containing some metadata pointing to the Substance Abuse Disorder entry in the target CDA document:

<r:RDF xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

    xmlns:a="http://www.w3.org/2000/10/annotation-ns#"

    xmlns:d="http://purl.org/dc/elements/1.1/">

    <r:Description>

        <r:type r:resource="http://www.w3.org/2000/10/annotation-ns#Annotation"/>

        <r:type r:resource="http://www.w3.org/2000/10/annotationType#Metadata"/>
<a:annotates r:resource="http://hospitalx.com/ehrs/cda.xml"/>

        <a:context>http://hospitalx.com/ehrs/cda.xml#xpointer(/ClinicalDocument/component/structuredBody/component[1]/section[1]/entry[1])</a:context>

        <d:title>Sample Metadata Tagging</d:title>

        <d:creator>Bob Smith</d:creator>

        <a:created>2011-10-14T12:10Z</a:created>

        <d:date>2011-10-14T12:10Z</d:date>
 <a:body>Do Not Disclose</a:body>

    </r:Description>

</r:RDF>

Please note a few interesting facts about the annotation document:

As explained by the original specification: "The Annotea protocol works without modifying the original document; that is, there is no requirement that the user have write access to the Web page being annotated."
The annotation itself has metadata using the well known Dublin Core metadata specification to specify who created this annotation and when.
The document being annotated is cda.xml located at http://hospitalx.com/ehrs/cda.xml. This is described by the element <a:annotates r:resource="http://hospitalx.com/ehrs/cda.xml"/>.
The specific element that is being annotated within the target CDA document is specified by the context element: <a:context>http://hospitalx.com/ehrs/cda.xml#xpointer(/ClinicalDocument/component/structuredBody/component[1]/section[1]/entry[1])</a:context> using XPointer, a specification described by the W3C as "the language to be used as the basis for a fragment identifier for any URI reference that locates a resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity."
The XPath expression /ClinicalDocument/component/structuredBody/component[1]/section[1]/entry[1] within the XPointer is used to target the entry element in the CDA document.
Using XPath (1.0 or 2.0) allows us to address any element (or node) in an XML document. For example, this XPath //value[@code='66214007']/ancestor::entry will point to any entry element which contains a value element with an attribute code='66214007' (essentially targeting all entry elements which contain a Substance Abuse Observation). The combination of XPath, XPointer, and standard medical terminology codes gives the ability to attach any annotation or metadata to any element having interoperable semantics.
The body element contains the actual annotation: <a:body>Do Not Disclose</a:body>. However, the body of the annotation can also be located outside of the annotation (e.g., in a shared metadata registry) in which case the body element will be marked up as in the following example: <a:body r:resource="http://metadataregistry.com/myconsentdirectives.xml"/>

Alternative Representations

As mentioned before, for those who for one reason or another don't want to use RDF and related Semantic Web technologies, the annotation can be easily converted to a pure XML (as opposed to RDF/XML), JSON, or Atom representation. The original Annotea Protocol describes a RESTful protocol which includes the following operations: posting, querying, downloading, updating, and deleting annotations. The Atom Publishing Protocol (APP) is a newer RESTful protocol that is well adapted to the Atom syndication format.

Processing Annotations with XPointer

How the annotations are processed and consumed is only limited by the requirements of a specific application and the imagination of the developers writing it. For example, an application can read both the annotation document and the target CDA document and overlay the annotations on top of the entries in the CDA document while displaying the latter in a web browser. Another example is the enforcement of privacy policies and preferences prior to exchanging the CDA document. The issue that will be raised is how to process the XPointer fragment identifiers. XPointer uses XPath which is a well established XML addressing mechanism supported by many XML processing APIs across programming languages. For those of you who use XSLT2 to process CDA documents, there is the open source XPointer Framework for XSLT2 for use with the Saxon XSLT2 engine.

Toward Intelligent Health IT (iHIT) Systems: Getting Out of the Box

2012-02-06T21:15:00.021-05:00

In this post, I describe a new type of application that I refer to as iHIT. iHIT stands for Intelligent Health IT.

The Architecture of Traditional Health IT systems

Traditional software architectures for health IT systems typically include the following:

Dependency Injection (DI)
Object Relational Mapping (ORM)
An architectural pattern for the presentation layer such as the Model View Controller (MVC) pattern
HTML5, CSS3, and a JavaScript library like JQuery/Mobile
Other architectural patterns including GoF Design Patterns, SOLID Principles, and Domain Driven Design (DDD)
Structured Query Language (SQL)
Enterprise Integration Patterns (EIPs) implemented through an Enterprise Service Bus (ESB) using HL7 messages as the "Published Language"
REST or SOAP-based web services.

An entire generation of developers has been trained in these techniques. They represent proven best practices accumulated over several decades of object-oriented design and relational data management. Although pervasive in today's clinical systems, these applications lack basic intelligent features such as the ability to capture and execute expert knowledge, make inferences, or make predictions about the future based on the analysis of historical data. Some of these systems actually look like glorified data entry systems.

With the availability and explosion of medical knowledge and real world observational EHR data, these intelligent features will become important in assisting clinicians in the medical decision making process at the point of care by reducing their cognitive load.

Intelligent Health IT (iHIT) Systems

iHIT systems process huge quantities of both structured and unstructured data to provide clinicians with specific recommendations. iHIT systems play an important role in translating Comparative Effectiveness Research (CER) findings into clinical practice. Comparative effectiveness Research (CER), an emerging trend in Evidence Based Medicine (EBM), has been defined by the Federal Coordinating Council for CER as "the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat and monitor health conditions in 'real world' settings." For example, based on the clinical profile of a patient, CER can help determine the best treatment option for breast cancer among the various options available such as: chemotherapy, radiation therapy, and surgery (Masectomy and Lumpectomy).

The following are examples of key characteristics displayed by iHIT systems:

The ability to analyze patient data as well as very large historical observational data sets in order to make probability-based predictions about the future and recommend specific actions that can yield the best clinical outcomes given the clinical profile of a patient.
The ability to capture and execute expert knowledge such as the medical knowledge contained in Clinical Practice Guidelines (CPGs). This includes the ability to mediate between different CPGs to arrive at a specific recommendation by merging and reconciling the medical knowledge in multiple CPGs as is the case with patients with comorbidities.
The ability to perform automated reasoning by inferring new implicit clinical facts from existing explicit facts and by exploiting semantic relationships between concepts and entities.
The ability to retrieve knowledge from unstructured data sources such as the biomedical research literature from sources like PubMed in order to answer clinical questions sometimes posed in natural language.
The ability to learn over time (and hence become smarter) as the amount of processed data continues its exponential growth.
Very fast response time to queries over very large data sets.

Sounds like Artificial Intelligence (AI)? I believe we are indeed witnessing the resurgence of AI and even the ideas of the Semantic Web in the healthcare industry. As healthcare costs and quality become national priorities for many countries around the world, the boundaries of computing will continue to be pushed further. Actually, some of the underlying principles of intelligent systems were originally developed decades and even centuries ago in the field of biomedical research. Williams Osler (1849-1919) famously said:

Medicine is a science of uncertainty and an art of probability.

Technologically advanced and competitive industries like financial services (e.g., credit eligibility and fraud detection), online retail (e.g., recommendation engine), and logistics (e.g., delivery route optimization) have adopted some of these technologies. Health IT developers now need to embrace them as well. This will require thinking out of the box.

The Ingredients of iHIT Systems

iHIT systems represent not one, but the integration of many different technologies. Mathematical Models, Statistical Analysis, and Machine Learning algorithms play an important role in iHIT systems. Examples include:

Logistic Regression models
Decision Trees
Association Rules
Bayesian Network
Neural Networks
Random Forests
Time Series for temporal reasoning
k-means Clustering
Support Vector Machines (SVM)
Probabilistic Graphical Models (PGMs) based on methods such as Bayesian networks and Markov Networks for making clinical decisions under uncertainty.

These algorithms can be used not only for making therapeutic predictions (e.g., the future hospitalization risk of a patient with Asthma), but also for dividing a population into subgroups based on the clinical profile of patients in order to achieve the best treatment outcomes.

Clinical Practice Guidelines (CPGs) are usually-based on Systematic Reviews (SRs) of Randomized Controlled Trials (RCTs) which are essentially scientific experiments. According to a report titled "Clinical Practice Guidelines (CPGs) We Can Trust" which was published last year by the Institute Of Medicine (IoM):

However, even when studies are considered to have high internal validity, they may not be generalizable to or valid for the patient population of guideline relevance. Randomized trials commonly have an under representation of important subgroups, including those with comorbidities, older persons, racial and ethnic minorities, and low-income, less educated, or low-literacy patients. Many RCTs and observational studies fail to include such "typical patients" in their samples; even when they do, there may not be sufficient numbers of such patients to assess them separately or the subgroups may not be properly analyzed for differences in outcomes.

On the other hand, observational studies using statistical analysis and machine learning algorithms operate on large real world observational data and can therefore provide feedback on the effectiveness of the actual use of different therapeutic interventions. Although very costly, RCTs are still considered the strongest form of evidence in EBM. Despite their inherent methodological challenges (lack of randomization leading to possible bias and confounding), observational studies are increasingly recognized as complementary to RCTs and an important tool in clinical decision making and health policy. iHIT systems play an important role in translating Comparative Effectiveness Research (CER) findings into clinical practice in the form of clinical decision support (CDS) interventions at the point of care.

iHIT systems also use business rules engines to capture and execute expert knowledge such as the medical knowledge contained in Clinical Practice Guidelines (CPGs). Examples include rules engines based on forward chaining inference, also known as production rule systems. These rules engines can be combined with Complex Event Processing (CEP) and Business Process Management (BPM) for intelligent decision making.

iHIT systems support ontologies such as those represented by the web ontology language (OWL) providing reasoning capabilities as well as the ability to navigate semantic relationships between concepts and entities.

More advanced iHIT systems have Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) capabilities in order to answer clinical questions posed in natural language. They rely on Information Retrieval techniques like probabilistic methods for scoring the relevance of a document given a query and the application of supervised machine learning classification methods such as decision trees, Naive Bayes, K-Nearest Neighbors (kNN), and Support Vector Machines (SVM).

In some cases, the responsibilities of an iHIT system are performed by Intelligent Agents which are autonomous entities capable of observing the clinical environment and acting upon those observations.

For scalability and performance, iHIT systems often sit on NoSQL databases and run on massively parallel computing platforms like Apache Hadoop while leveraging the elasticity of the cloud.

Integrating these technologies is the main challenge posed by iHIT systems. An example is the integration between statistical and machine learning models, business rules, ontologies, and more traditional forms of computing such as object-oriented programming. Various solutions to these challenges have been proposed and implemented.

Human-Centered Design

Finally, iHIT systems fully embrace a human-centered design approach. They provide a seamless integration between automated decision logic and clinical workflows. They provide the clinician with detailed explanations of the rationale behind the actions they recommend. In addition, they use techniques like Visual Analytics to enhance human cognitive abilities in order to facilitate analytical reasoning over very large data sets.