Adventures in Computing: patterns

A Software Architecture for Precision Medicine

Intelligent systems in clinical care leverage the latest innovations in machine learning, real-time data stream mining, visual analytics, natural language processing, ontologies, production rule systems, and cloud computing to provide clinicians with the best knowledge and information at the point of care for effective clinical decision making. In this post, I propose a unified open reference architecture that combines all these technologies into a hybrid cognitive system for clinical decision support. Indeed, truly intelligent systems are capable of reasoning. The goal is not to replace clinicians, but instead to provide them with cognitive support during clinical decision making. Furthermore, Intelligent Personal Assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana have raised our expectations on how intelligent systems interact with users through voice and natural language.

In the strict sense of the term, a reference architecture should be abstracted away from concrete technology implementation. However in order to enable a better understanding of the proposed approach, I take liberty in explaining how available open source software can be used to realize the intent of the architecture. There is an urgent need for an open and interoperable architecture which can be deployed across devices and platforms. Unfortunately, this is not the case today with solutions like Apple's HealthKit and ResearchKit.

The specific open source software mentioned in this post can be substituted with other tools which provide similar capabilities. The following diagram is a depiction of the architecture (click to enlarge).

Clinical Data Sources

Clinical data sources are represented on the left of the architecture diagram. Examples include electronic medical record systems (EMR) commonly used in routine clinical care, clinical genome databases, genome variant knowledge bases, medical imaging databases, data from medical devices and wearable sensors, and unstructured data sources such as biomedical literature databases. The approach implements the Lambda Architecture enabling both batch and real-time data stream processing and mining.

Predictive Modeling, Real-Time Data Stream Mining, and Big Data Genomics

The back-end provides various tools and frameworks for advanced analytics and decision management. The analytics workbench includes tools for creating predictive models and data streaming mining. The decision management workbench includes a production rule system (providing seamless integration with clinical events and processes) and an ontology editor.

The incoming clinical data likely meet the Big Data criteria of volume, velocity, and variety (this is particularly true for physiological time series from wearable sensors). Therefore, specialized frameworks for large scale cluster computing like Apache Spark are used to analyze and process the data. Statistical computing and Machine Learning tools like R are used here as well. The goal is knowledge and patterns discovery using Machine Learning model builders like Decision Trees, k-Means Clustering, Logistic Regression, Support Vector Machines (SVMs), Bayesian Networks, Neural Networks, and the more recent Deep Learning techniques. The latter hold great promise in applications such as Natural Language Processing (NLP), medical image analysis, and speech recognition.

These Machine Learning algorithms can support diagnosis, prognosis, simulation, anomaly detection, care alerting, and care planning. For example, anomaly detection can be performed at scale using the k-means clustering machine learning algorithm in Apache Spark. In addition, Apache Spark allows the implementation of the Lambda Architecture and can also be used for genome Big Data analysis at scale.

In another post titled How Good is Your Crystal Ball?: Utility, Methodology, and Validity of Clinical Prediction Models, I discuss quantitative measures of performance for clinical prediction models.

Visual Analytics

Visual Analytics tools like D3.js, rCharts, ploty, googleVis, ggplot2, and ggvis can help obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. Of particular interest is Visual Analytics of real-time data streams like physiological time series. As a multidisciplinary field, Visual Analytics combines several disciplines such as human perception and cognition, interactive graphic design, statistical computing, data mining, spatio-temporal data analysis, and even Art. For example, similar to Minard's map of the Russian Campaign of 1812-1813 (see graphic below), Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes over a certain period of time by displaying causes, variables, comparisons, and explanations.

Production Rule System, Ontology Reasoning, and NLP

The architecture also includes a production rule engine and an ontology editor (Drools and Protégé respectively). This is done in order to leverage existing clinical domain knowledge available from clinical practice guidelines (CPGs) and biomedical ontologies like SNOMED CT. This approach complements machine learning algorithms' probabilistic approach to clinical decision making under uncertainty. The production rule system can translate CPGs into executable rules which are fully integrated with clinical processes (workflows) and events. The ontologies can provide automated reasoning capabilities for decision support.

NLP includes capabilities such as:

Text classification, text clustering, document and passage retrieval, text summarization, and more advanced clinical question answering (CQA) capabilities which can be useful for satisfying clinicians' information needs at the point of care; and
Named entity recognition (NER) for extracting concepts from clinical notes.

The data tier supports the efficient storage of large amounts of time series data and is implemented with tools like Cassandra and HBase. The system can run in the cloud, for example using the Amazon Elastic Compute Cloud (EC2). For real-time processing of distributed data streams, cloud-based solutions like Amazon Kinesis and Lambda can be used.

Clinical Decision Services

The clinical decision services provide intelligence at the point of care typically using deployed predictive models, clinical rules, text mining outputs, and ontology reasoners. For example, Machine Learning algorithms can be exported in predictive markup language (PMML) format for run-time scoring based on the clinical data of individual patients, enabling what is referred to as Personalized Medicine. Clinical decision services include:

Diagnosis and prognosis
Simulation
Anomaly detection
Data visualization
Information retrieval (e.g., clinical question answering)
Alerts and reminders
Support for care planning processes.

The clinical decision services can be deployed in the cloud as well. Other clinical systems can consume these services through a SOAP or REST-based web service interface (using the HL7 vMR and DSS specifications for interoperability) and single sign-on (SSO) standards like SAML2 and OpenID Connect.

Intelligent Personal Assistants (IPAs)

Clinical decision services can also be delivered to patients and clinicians through IPAs. IPAs can accept inputs in the form of voice, images, and user's context and respond in natural language. IPAs are also expanding to wearable technologies such as smart watches and glasses. The precision of speech recognition, natural language processing, and computer vision is improving rapidly with the adoption of Deep Learning techniques and tools. Accelerated hardware technologies like GPUs and FPGAs are improving the performance and reducing the cost of deploying these systems at scale.

Hexagonal, Reactive, and Secure Architecture

Intelligent Health IT systems are not just capable of discovering knowledge and patterns in data. They are also scalable, resilient, responsive, and secure. To achieve these objectives, several architectural patterns have emerged during the last few years:

Domain Driven Design (DDD) puts the emphasis on the core domain and domain logic and recommends a layered architecture (typically user interface, application, domain, and infrastructure) with each layer having well defined responsibilities and interfaces for interacting with other layers. Models exist within "bounded contexts". These "bounded contexts" communicate with each other typically through messaging and web services using HL7 standards for interoperability.

The Hexagonal Architecture defines "ports and adapters" as a way to design, develop, and test an application in a way that is independent of the various clients, devices, transport protocols (HTTP, REST, SOAP, MQTT, etc.), and even databases that could be used to consume its services in the future. This is particularly important in the era of the Internet of Things in healthcare.

Microservices consist in decomposing large monolithic applications into smaller services following good old principles of service-oriented design and single responsibility to achieve modularity, maintainability, scalability, and ease of deployment (for example, using Docker).

CQRS/ES: Command Query Responsibility Segregation (CQRS) and Event Sourcing (ES) are two architectural patterns which consist in the use of event-driven messaging and an Event Store for separating commands (write-side) from queries (read-side) relying on the principle of Eventual Consistency. CQRS/ES can be implemented in combination with microservices to deliver new capabilities such as temporal queries, behavioral analysis, complex audit logs, and real-time notifications and alerts.

Functional Programming: Functional Programming languages like Scala have several benefits that are particularly important for applying Machine Learning algorithms on large data sets. Like functions in mathematics, functions in Scala have no side effects. This provides referential transparency. Machine Learning algorithms are in fact based on Linear Algebra and Calculus. Scala supports high-order functions as well. Variables are immutable witch greatly simplifies concurrency. For all those reasons, Machine Learning libraries like Apache Mahout have embraced Scala, moving away from the Java MapReduce paradigm.

Reactive Architecture: The Reactive Manifesto makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." Leading frameworks that support Reactive Programming include Akka and RxJava. The latter is a library for composing asynchronous and event-based programs using observable sequences. RxJava is a Java port (with a Scala adaptor) of the original Rx (Reactive Extensions) for .NET created by Erik Meijer.

Based on the Actor Model and built in Scala, Akka is a framework for building highly concurrent, asynchronous, distributed, and fault tolerant event-driven applications on the JVM. Akka offers location transparency, fault tolerance, asynchronous message passing, and a non-deterministic share-nothing architecture. Akka Cluster provides a fault-tolerant decentralized peer-to-peer based cluster membership service with no single point of failure or single point of bottleneck.

Also built with Scala, Apache Kafka is a scalable message broker which provides high-throughput, fault-tolerance, built-in partitioning, and replication for processing real-time data streams. In the reference architecture, the ingestion layer is implemented with Akka and Apache Kafka.

Web Application Security: special attention is given to security across all layers, notably the proper implementation of authentication, authorization, encryption, and audit logging. The implementation of security is also driven by deep knowledge of application security patterns, threat modeling, and enforcing security best practices (e.g., OWASP Top Ten and CWE/SANS Top 25 Most Dangerous Software Errors) as part of the continuous delivery process.

An Interface that Works across Devices and Platforms

The front-end uses a Mobile First approach and a Single Page Application (SPA) architecture with Javascript-based frameworks like AngularJS to create very responsive user experiences. It also allows us to bring the following software engineering best practices to the front-end:

Dependency Injection
Test-Driven Development (Jasmine, Karma, PhantomJS)
Package Management (Bower or npm)
Build system and Continuous Integration (Grunt or Gulp.js)
Static Code Analysis (JSLint and JSHint), and
End-to-End Testing (Protractor).

For mobile devices, Apache Cordova can be used to access native functions when desired. The main goal is to provide a user interface that works across devices and platforms such as iOS, Android, and Windows Phone.

Interoperability

Interoperability will always be a key requirement in clinical systems. Interoperability is needed between all players in the healthcare ecosystem including providers, payers, labs, knowledge artifact developers, quality measure developers, and public health agencies like the CDC. These standards exist today and are implementation-ready. However, only health IT buyers have the leverage to demand interoperability from their vendors.

Standards related to clinical decision support (CDS) include:

The HL7 Fast Healthcare Interoperability Resources (FHIR)
The HL7 virtual Medical Record (vMR)
The HL7 Decision Support Services (DSS) specification
The HL7 CDS Knowledge Artifact specification
The DMG Predictive Model Markup Language (PMML) specification.

Overcoming Barriers to Adoption

In a previous post, I discussed a practical approach to addressing challenges to the adoption of clinical decision support (CDS) systems.

If you are a software developer or architect, in addition to the ever changing business requirements, you also need to deal with the myriad of application development frameworks and design patterns out there. There are frameworks for: the User Interface (UI), Dependency Injection (DI), Aspect Oriented Programming (AOP), and Object-Relational Mapping (ORM). On top of that, you will probably need a web services framework and perhaps an Enterprise Service Bus (ESB) if you need to integrate applications. As an architect, you also need to keep an eye on scalability, availability, security, usability, industry standards, and government regulations.

In such an environment, the lack of a disciplined approach to software architecture can quickly lead to a Big Ball of Mud. In a paper presented in 1997 at the Fourth Conference on Patterns Languages of Programs, Brian Foote and Joseh Yoder describe the Big Ball of Mud:

A BIG BALL OF MUD is haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle. We’ve all seen them. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.

The Big Ball of Mud remains the most pervasive architecture today. Note that these problems can be exacerbated by an agile software development approach that leaves little or no room for design and strategic thinking (see my previous post on software architecture documentation in agile projects).

Domain Driven Design (DDD) is a set of patterns that have been introduced by Eric Evans in his book entitled: "Domain-Driven Design: Tackling Complexity in the Heart of Software". I won't go into the details of what those patterns are. I do recommend that you read the book and there are other free DDD resources on the web as well. However, I will share with you some key DDD principles that have been helpful to me in wrapping my head around software architecture complexity:

Collaboration between software developers and domain experts is important to create a common understanding of the concepts of the domain. Note that we're not talking about UI components such as screens or fields here, nor are we talking about computer science abstractions such as classes and objects. We are talking about what the domain is made of conceptually. These domain concepts are expressed in a Ubiquitous Language that is used not only in conversations and software documentation, but also in the code. So in essence, DDD is model-driven design: the model can be translated into code and frameworks like Naked Objects can help you do exactly that. Continuously refine the model.
Experimentation and rapid prototyping are an efficient way to collaborate with domain experts and business analysts. This is where the Naked Objects Framework can be very helpful. Note that you can use Naked Objects for the initial prototyping and domain modeling, and then use your preferred frameworks for the remaining layers of your application.
Pay attention to the correct implementation of DDD building blocks such as: entities (behaviour-rich with business rules), value objects, aggregates, aggregate roots, domain events, factories, repositories, and services. Avoid an anemic domain model and know how to recognize value objects and persist them properly. Value objects (such as money and time interval) are immutable and are manipulated through side-effect free functions. Move complexity and behaviour out of your entities into those value objects.
Domain concepts are grouped into modules to delineate what Eric Evans calls the "conceptual contours" of the domain. To reduce coupling, OO patterns such as the dependency inversion principle, the interface segregation principle, and the acyclic dependency principles are applied.
DDD recommends the following four layers: the presentation layer, the application layer, the domain layer, and the infrastructure layer. Although the idea of a multi-tier architecture is not new, the anti-patterns are typically: a fat application layer, an anemic domain model, and a tangled mess in general. So, properly layer your architecture:
- Repository interfaces are in the domain layer, but their implementation are in the infrastructure layer to allow "Persistence Ignorance"
- Both the interface and implementation of factories are in the domain layer
- Domain and infrastructure services are injected into entities using dependency injection (some argue that DDD is not possible without DI, AOP, and ORM)
- The application layer takes care of cross-cutting concerns such as transactions and security. It can also mediate between the presentation layer and domain layer through Data Transfer Objects (DTOs).
DDD enables Object Oriented User Interfaces (OOUI) which expose the richness of the domain layer as opposed to obscuring it.
Models exist within bounded contexts and the latter should be clearly identified. In his book, Eric Evans talks about "strategic design" and "context maps" and suggests the following options for integrating applications:
- Published language
- Open host service
- Shared kernel
- Customer/supplier
- Conformist
- Anti-corruption layer
- Separate ways.
In industries such as healthcare where an XML-based data exchange standard exists, the "Published Language" approach is the pattern typically used. Each healthcare application participating in an exchange represents a separate context. On the other hand, an "Anti-Corruption Layer" can be created as an adapter to isolate the model against an industry standard model that is not considered best practice in data modeling, is inconsistent, immature, or subject to change. However, since there is tremendous value in exchanging data, we hope not to go "Separate Ways".
DDD is a solid foundation for next-generation architecture based on the Command Query Responsibility Segregation (CQRS) pattern. The UI sends commands which are handled by command handlers. These command handlers change the state of aggregate roots. However, the aggregate roots still define behavior and business rules and are responsible for maintaining invariants. Changes to aggregate roots generate events that are stored in an event store (this is called event sourcing). Aggregate roots are persisted by storing these streams of events in the event store. That way, the aggregate roots can be reconstructed by replaying the events from the event store. The events are published to subscribers including denormalizers for enhanced query performance. The separation of the read side from the write side allows:
- Increased performance and scalability on the read and reporting side particularly when combined with a cloud deployment model
- Complete audit trails through the event store
- New data mining capabilities leveraging temporal queries
- The opportunity to eliminate the need for Object Relational Mapping (ORM) through the use of high performance NoSQL databases.

Adventures in Computing

Sunday, November 2, 2014

Toward a Reference Architecture for Intelligent Systems in Clinical Care

A Software Architecture for Precision Medicine

Clinical Data Sources

Predictive Modeling, Real-Time Data Stream Mining, and Big Data Genomics

Visual Analytics

Production Rule System, Ontology Reasoning, and NLP

Clinical Decision Services

Intelligent Personal Assistants (IPAs)

Hexagonal, Reactive, and Secure Architecture

An Interface that Works across Devices and Platforms

Interoperability

Overcoming Barriers to Adoption

Sunday, December 19, 2010

How Not to Build A Big Ball of Mud

License

About Me

Disclaimer

Blog Archive

Adventures in Computing

Sunday, November 2, 2014

Toward a Reference Architecture for Intelligent Systems in Clinical Care

A Software Architecture for Precision Medicine

Clinical Data Sources

Predictive Modeling, Real-Time Data Stream Mining, and Big Data Genomics

Visual Analytics

Production Rule System, Ontology Reasoning, and NLP

Clinical Decision Services

Intelligent Personal Assistants (IPAs)

Hexagonal, Reactive, and Secure Architecture

An Interface that Works across Devices and Platforms

Interoperability

Overcoming Barriers to Adoption

Sunday, December 19, 2010

How Not to Build A Big Ball of Mud

License

About Me

Disclaimer

Subscribe To

Blog Archive