Adventures in Computing: SOA

Showing posts with label SOA. Show all posts

Sunday, November 2, 2014

Toward a Reference Architecture for Intelligent Systems in Clinical Care

A Software Architecture for Precision Medicine

Intelligent systems in clinical care leverage the latest innovations in machine learning, real-time data stream mining, visual analytics, natural language processing, ontologies, production rule systems, and cloud computing to provide clinicians with the best knowledge and information at the point of care for effective clinical decision making. In this post, I propose a unified open reference architecture that combines all these technologies into a hybrid cognitive system for clinical decision support. Indeed, truly intelligent systems are capable of reasoning. The goal is not to replace clinicians, but instead to provide them with cognitive support during clinical decision making. Furthermore, Intelligent Personal Assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana have raised our expectations on how intelligent systems interact with users through voice and natural language.

In the strict sense of the term, a reference architecture should be abstracted away from concrete technology implementation. However in order to enable a better understanding of the proposed approach, I take liberty in explaining how available open source software can be used to realize the intent of the architecture. There is an urgent need for an open and interoperable architecture which can be deployed across devices and platforms. Unfortunately, this is not the case today with solutions like Apple's HealthKit and ResearchKit.

The specific open source software mentioned in this post can be substituted with other tools which provide similar capabilities. The following diagram is a depiction of the architecture (click to enlarge).

Clinical Data Sources

Clinical data sources are represented on the left of the architecture diagram. Examples include electronic medical record systems (EMR) commonly used in routine clinical care, clinical genome databases, genome variant knowledge bases, medical imaging databases, data from medical devices and wearable sensors, and unstructured data sources such as biomedical literature databases. The approach implements the Lambda Architecture enabling both batch and real-time data stream processing and mining.

Predictive Modeling, Real-Time Data Stream Mining, and Big Data Genomics

The back-end provides various tools and frameworks for advanced analytics and decision management. The analytics workbench includes tools for creating predictive models and data streaming mining. The decision management workbench includes a production rule system (providing seamless integration with clinical events and processes) and an ontology editor.

The incoming clinical data likely meet the Big Data criteria of volume, velocity, and variety (this is particularly true for physiological time series from wearable sensors). Therefore, specialized frameworks for large scale cluster computing like Apache Spark are used to analyze and process the data. Statistical computing and Machine Learning tools like R are used here as well. The goal is knowledge and patterns discovery using Machine Learning model builders like Decision Trees, k-Means Clustering, Logistic Regression, Support Vector Machines (SVMs), Bayesian Networks, Neural Networks, and the more recent Deep Learning techniques. The latter hold great promise in applications such as Natural Language Processing (NLP), medical image analysis, and speech recognition.

These Machine Learning algorithms can support diagnosis, prognosis, simulation, anomaly detection, care alerting, and care planning. For example, anomaly detection can be performed at scale using the k-means clustering machine learning algorithm in Apache Spark. In addition, Apache Spark allows the implementation of the Lambda Architecture and can also be used for genome Big Data analysis at scale.

In another post titled How Good is Your Crystal Ball?: Utility, Methodology, and Validity of Clinical Prediction Models, I discuss quantitative measures of performance for clinical prediction models.

Visual Analytics

Visual Analytics tools like D3.js, rCharts, ploty, googleVis, ggplot2, and ggvis can help obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. Of particular interest is Visual Analytics of real-time data streams like physiological time series. As a multidisciplinary field, Visual Analytics combines several disciplines such as human perception and cognition, interactive graphic design, statistical computing, data mining, spatio-temporal data analysis, and even Art. For example, similar to Minard's map of the Russian Campaign of 1812-1813 (see graphic below), Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes over a certain period of time by displaying causes, variables, comparisons, and explanations.

Production Rule System, Ontology Reasoning, and NLP

The architecture also includes a production rule engine and an ontology editor (Drools and Protégé respectively). This is done in order to leverage existing clinical domain knowledge available from clinical practice guidelines (CPGs) and biomedical ontologies like SNOMED CT. This approach complements machine learning algorithms' probabilistic approach to clinical decision making under uncertainty. The production rule system can translate CPGs into executable rules which are fully integrated with clinical processes (workflows) and events. The ontologies can provide automated reasoning capabilities for decision support.

NLP includes capabilities such as:

Text classification, text clustering, document and passage retrieval, text summarization, and more advanced clinical question answering (CQA) capabilities which can be useful for satisfying clinicians' information needs at the point of care; and
Named entity recognition (NER) for extracting concepts from clinical notes.

The data tier supports the efficient storage of large amounts of time series data and is implemented with tools like Cassandra and HBase. The system can run in the cloud, for example using the Amazon Elastic Compute Cloud (EC2). For real-time processing of distributed data streams, cloud-based solutions like Amazon Kinesis and Lambda can be used.

Clinical Decision Services

The clinical decision services provide intelligence at the point of care typically using deployed predictive models, clinical rules, text mining outputs, and ontology reasoners. For example, Machine Learning algorithms can be exported in predictive markup language (PMML) format for run-time scoring based on the clinical data of individual patients, enabling what is referred to as Personalized Medicine. Clinical decision services include:

Diagnosis and prognosis
Simulation
Anomaly detection
Data visualization
Information retrieval (e.g., clinical question answering)
Alerts and reminders
Support for care planning processes.

The clinical decision services can be deployed in the cloud as well. Other clinical systems can consume these services through a SOAP or REST-based web service interface (using the HL7 vMR and DSS specifications for interoperability) and single sign-on (SSO) standards like SAML2 and OpenID Connect.

Intelligent Personal Assistants (IPAs)

Clinical decision services can also be delivered to patients and clinicians through IPAs. IPAs can accept inputs in the form of voice, images, and user's context and respond in natural language. IPAs are also expanding to wearable technologies such as smart watches and glasses. The precision of speech recognition, natural language processing, and computer vision is improving rapidly with the adoption of Deep Learning techniques and tools. Accelerated hardware technologies like GPUs and FPGAs are improving the performance and reducing the cost of deploying these systems at scale.

Hexagonal, Reactive, and Secure Architecture

Intelligent Health IT systems are not just capable of discovering knowledge and patterns in data. They are also scalable, resilient, responsive, and secure. To achieve these objectives, several architectural patterns have emerged during the last few years:

Domain Driven Design (DDD) puts the emphasis on the core domain and domain logic and recommends a layered architecture (typically user interface, application, domain, and infrastructure) with each layer having well defined responsibilities and interfaces for interacting with other layers. Models exist within "bounded contexts". These "bounded contexts" communicate with each other typically through messaging and web services using HL7 standards for interoperability.

The Hexagonal Architecture defines "ports and adapters" as a way to design, develop, and test an application in a way that is independent of the various clients, devices, transport protocols (HTTP, REST, SOAP, MQTT, etc.), and even databases that could be used to consume its services in the future. This is particularly important in the era of the Internet of Things in healthcare.

Microservices consist in decomposing large monolithic applications into smaller services following good old principles of service-oriented design and single responsibility to achieve modularity, maintainability, scalability, and ease of deployment (for example, using Docker).

CQRS/ES: Command Query Responsibility Segregation (CQRS) and Event Sourcing (ES) are two architectural patterns which consist in the use of event-driven messaging and an Event Store for separating commands (write-side) from queries (read-side) relying on the principle of Eventual Consistency. CQRS/ES can be implemented in combination with microservices to deliver new capabilities such as temporal queries, behavioral analysis, complex audit logs, and real-time notifications and alerts.

Functional Programming: Functional Programming languages like Scala have several benefits that are particularly important for applying Machine Learning algorithms on large data sets. Like functions in mathematics, functions in Scala have no side effects. This provides referential transparency. Machine Learning algorithms are in fact based on Linear Algebra and Calculus. Scala supports high-order functions as well. Variables are immutable witch greatly simplifies concurrency. For all those reasons, Machine Learning libraries like Apache Mahout have embraced Scala, moving away from the Java MapReduce paradigm.

Reactive Architecture: The Reactive Manifesto makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." Leading frameworks that support Reactive Programming include Akka and RxJava. The latter is a library for composing asynchronous and event-based programs using observable sequences. RxJava is a Java port (with a Scala adaptor) of the original Rx (Reactive Extensions) for .NET created by Erik Meijer.

Based on the Actor Model and built in Scala, Akka is a framework for building highly concurrent, asynchronous, distributed, and fault tolerant event-driven applications on the JVM. Akka offers location transparency, fault tolerance, asynchronous message passing, and a non-deterministic share-nothing architecture. Akka Cluster provides a fault-tolerant decentralized peer-to-peer based cluster membership service with no single point of failure or single point of bottleneck.

Also built with Scala, Apache Kafka is a scalable message broker which provides high-throughput, fault-tolerance, built-in partitioning, and replication for processing real-time data streams. In the reference architecture, the ingestion layer is implemented with Akka and Apache Kafka.

Web Application Security: special attention is given to security across all layers, notably the proper implementation of authentication, authorization, encryption, and audit logging. The implementation of security is also driven by deep knowledge of application security patterns, threat modeling, and enforcing security best practices (e.g., OWASP Top Ten and CWE/SANS Top 25 Most Dangerous Software Errors) as part of the continuous delivery process.

An Interface that Works across Devices and Platforms

The front-end uses a Mobile First approach and a Single Page Application (SPA) architecture with Javascript-based frameworks like AngularJS to create very responsive user experiences. It also allows us to bring the following software engineering best practices to the front-end:

Dependency Injection
Test-Driven Development (Jasmine, Karma, PhantomJS)
Package Management (Bower or npm)
Build system and Continuous Integration (Grunt or Gulp.js)
Static Code Analysis (JSLint and JSHint), and
End-to-End Testing (Protractor).

For mobile devices, Apache Cordova can be used to access native functions when desired. The main goal is to provide a user interface that works across devices and platforms such as iOS, Android, and Windows Phone.

Interoperability

Interoperability will always be a key requirement in clinical systems. Interoperability is needed between all players in the healthcare ecosystem including providers, payers, labs, knowledge artifact developers, quality measure developers, and public health agencies like the CDC. These standards exist today and are implementation-ready. However, only health IT buyers have the leverage to demand interoperability from their vendors.

Standards related to clinical decision support (CDS) include:

The HL7 Fast Healthcare Interoperability Resources (FHIR)
The HL7 virtual Medical Record (vMR)
The HL7 Decision Support Services (DSS) specification
The HL7 CDS Knowledge Artifact specification
The DMG Predictive Model Markup Language (PMML) specification.

Overcoming Barriers to Adoption

In a previous post, I discussed a practical approach to addressing challenges to the adoption of clinical decision support (CDS) systems.

Monday, September 15, 2014

Single Sign-On (SSO) for Cloud-based SaaS Applications

Single Sign-On (SSO) is a key capability for Software as a Service (SaaS) applications particularly when there is a need to integrate with existing enterprise applications. In the enterprise world dominated by SOAP-based web services, security has been traditionally achieved with standards like WS-Security, WS-SecurityPolicy, WS-SecureConversation, WS-Trust, XML Encryption, XML Signatures, the WS-Security SAML Token Profile, and XACML.

During the last few years, the popularity of Web APIs, mobile technology, and Cloud-based software services has led to the emergence of light-weight security standards in support of the new REST/JSON paradigm with specifications like OAuth2 and OpenID Connect.

In this post, I discuss the state of the art in standards for SSO.

SAML2 Web SSO Profile

SAML2 Web SSO Profile (not to be confused with the WS-Security SAML Token Profile mentioned earlier) is not a new standard. It was approved as an OASIS standard in 2005. SAML2 Web SSO Profile is still today a force to reckon with when it comes to enabling SSO within the enterprise. In a post titled SAML vs OAuth: Which One Should I Use?, Anil Saldhana, former Lead Identity Management Architect at Red Hat offered the following suggestions:

If your usecase involves SSO (when at least one actor or participant is an enterprise), then use SAML.

If your usecase involves providing access (temporarily or permanent) to resources (such as accounts, pictures, files etc), then use OAuth.

If you need to provide access to a partner or customer application to your portal, then use SAML.

If your usecase requires a centralized identity source, then use SAML (Identity provider).

If your usecase involves mobile devices, then OAuth2 with some form of Bearer Tokens is appropriate.

Salesforce.com who is arguably the leader in cloud-based SaaS services supports SAML2 Web SSO Profile as one of its main SSO mechanisms (see the Salesforce Single Sign-On Implementation Guide). The Google Apps platform supports SAML2 Web SSO Profile as well.

Federal Identity, Credential, and Access Management (FICAM), a US Federal Government initiative has selected SAML2 Web SSO Profile for the purpose of Level of Assurance (LOA) 1 to 4 as defined by the NIST Special Publication 800-62-2 (see ICAM SAML 2.0 Web Browser SSO Profile). This is significant given the challenges associated with identity federation at the scale of a large organization like the US federal government.

SAML bindings specify underlying transport protocols including:

HTTP Redirect Binding
HTTP POST Binding
HTTP Artifact Binding
SAML SOAP Binding.

SAML profiles define how the SAML assertions, protocols, and bindings are combined to support particular usage scenarios. The Web Browser SSO Profile and the Single Logout Profile are the most commonly used profiles.

Identity Provider (idP) initiated SSO with POST binding is one the most popular implementations (see diagram below from the OASIS SAML Technical Overview for a typical authentication flow).

The SAML2 Web SSO ecosystem is very mature, cross-platform, and scalable. There are a number of open source implementations available as well. However, things are constantly changing in technology and identity federation is no exception. At the Cloud Identity Summit in 2012, Craig Burton, a well known analyst in the identity space declared:

SAML is the Windows XP of Identity. No funding. No innovation. People still use it. But it has no future. There is no future for SAML. No one is putting money into SAML development. No one is writing new SAML code. SAML is dead.

Craig Burton further clarified his remarks by saying:

SAML is dead does not mean SAML is bad. SAML is dead does not mean SAML isn’t useful. SAML is dead means SAML is not the future.

At the time, this provoked a storm in the Twitterverse because of the significant investments that have been made by enterprise customers to implement SAML2 for SSO.

WS-Federation

There is an alternative to SAML2 Web SSO Profile called WS-Federation which is supported in Microsoft products like Active Directory Federation Services (ADFS), Windows Identity Foundation (WIF), and Azure Active Directory. Microsoft has been a strong promoter of WS-Federation and has implemented WS-Federation in several products. There is also a popular open source identity server on the .NET platform called Thinktecture IdentityServer v2 which also supports WS-Federation.

For enterprise SSO scenarios between business partners exclusively using Microsoft products and development environment, WS-Federation could be a serious contender. However, SAML2 is more widely supported and implemented outside of the Microsoft world. For example, Salesforce.com and Google Apps do not support WS-Federation for SSO. Note that Microsoft ADFS implements the SAML2 Web SSO Profile in addition to WS-Federation.

OpenID Connect

OpenID Connect is a simple identity layer on top of OAuth2. It has been ratified by the OpenID Foundation in February 2014 but has been in development for several years. Nat Sakimura's Dummy’s guide for the Difference between OAuth Authentication and OpenID is a good resource for understanding the difference between OpenID, OAuth2, and OpenID Connect. In particular, it explains why OAuth2 alone is not strictly an authentication standard. The following diagram from the OpenID Connect specification represents the components of the OpenID Connect stack (click to enlarge).

Also note that OAuth2 tokens can be JSON Web Token (JWT) or SAML assertions.

The following is the basic flow as defined in the OpenID Connect specification:

The RP (Client) sends a request to the OpenID Provider (OP).

The OP authenticates the End-User and obtains authorization.

The OP responds with an ID Token and usually an Access Token.

The RP can send a request with the Access Token to the UserInfo Endpoint.

The UserInfo Endpoint returns Claims about the End-User.

There are two subsets of the Core functionality with corresponding implementer’s guides:

Basic Client Implementer’s Guide –for a web-based Relying Party (RP) using the OAuth code flow
Implicit Client Implementer’s Guide – for a web-based Relying Party using the OAuth implicit flow

OpenID Connect is particularly well-suited for modern applications which offer RESTful Web APIs, support JSON payloads, run on mobile devices, and are deployed to the Cloud. Despite being a relatively new standard, OpenID Connect also boasts an impressive list of implementations across platforms. It is already supported by big players like Google, Microsoft, PayPal, and Salesforce. In particular, Google is consolidating all federated sign-in support onto the OpenID Connect standard. Open Source OpenID Connect Identity Providers include the Java-based OpenAM and the .Net-based Thinktecture Identity Server v3.

From WS* to JW* and JOSE

As can be seen from the diagram above, a complete identity federation ecosystem based on OpenID Connect will also require standards for representing security assertions, digital signatures, encryption, and cryptographic keys. These standards include:

JSON Web Token (JWT)
JSON Web Signature (JWS)
JSON Web Encryption (JWE)
JSON Web Key (JWK)
JSON Web Algorithms (JWA).

There is a new acronym for these emerging JSON-based identity and security protocols: JOSE which stands for Javascript Object Signing and Encryption. It is also the name of the IETF Working Group developing JWS, JWE, and JWK. A Java-based open source implementation called jose4j is available.

Access Control with the User Managed Access (UMA)

According to the UMA Core specification,

User-Managed Access (UMA) is a profile of OAuth 2.0. UMA defines how resource owners can control protected-resource access by clients operated by arbitrary requesting parties, where the resources reside on any number of resource servers, and where a centralized authorization server governs access based on resource owner policy.

In the UMA protocol, OpenID Connect provides federated SSO and is also used to convey user claims to the authorization server. In a previous post titled Patient Privacy at Web Scale, I discussed the application of UMA to the challenges of patient privacy.

Sunday, November 10, 2013

Toward Polyglot Programming on the JVM

In my previous post titled Treating Javascript as a first class language, I wrote about how the Java Virtual Machine (JVM) is evolving with new languages and frameworks like Groovy, Grails, Scala, Akka, and the Play Framework. In this post, I report on my experience in learning and evaluating these emerging technologies and their roles in the Java ecosystem.

A KangaRoo on the JVM

On a previous project, I used Spring Roo to jumpstart the software development process. Spring Roo was created by Ben Alex, an Australian engineer who is also the creator of Spring Security. Spring Roo was a big productivity boost and generated a significant amount of code and configuration based on the specification of the domain model. Spring Roo automatically generated the following:

The domain entities with support for JPA annotations.
Repository and service layers. In addition to JPA, Spring Roo also supports NoSQL persistence for MongoDB based on the Spring Data repository abstraction.
A web layer with Spring MVC controllers and JSP views with support for Tiles-based layout, theming, and localization. The JSP views were subsequently replaced with a combination of Thymeleaf (a next generation server-side HTML5 template engine) and Twitter Boostrap to support a Responsive Web Design (RWD) approach. Roo also supports GWT and JSF.
REST and JSON remoting for all domain types.
Basic configuration for Spring Security, Spring Web Flow, Spring Integration, JMS, Email, and Apache Solr.
Entity mocking, automatic generation of test data ("Data on Demand"), in-container integration testing, and end-to-end Selenium integration tests.
A Maven build file for the project and full integration with Spring STS.
Deployment to Cloud Foundry.

Roo also supports other features such as database reverse engineering and Ajax . Another benefit of using Roo is that it helped enforce Spring best practices and other architectural concerns such as proper application layering.

For my future projects, I am looking forward to taking developer's productivity and innovation to the next level. There are several criteria in my mind:

Being able to do more with less. This means being able to write code that is concise, expressive, requires less configuration and boilerplate coding, and is easier to understand and maintain (particularly for difficult concerns like concurrency which is a key factor in scalability).
Interoperability with the Java language and being able to run on the JVM, so that I can take advantage of the larger and rich Java ecosystem of tools and frameworks.
Lastly, my interest in responsive, massively scalable, and fault-tolerant systems has picked up recently.

Getting Groovy

Maven has been a very powerful build system for several projects that I have worked on. My goal now is to support continuous delivery pipelines as a pattern for achieving high quality software. Large open source projects like Hibernate, Spring, and Android have already moved to Gradle. Gradle builds are written in a Groovy DSL and are more concise than Maven POM files which are based on a more verbose XML syntax. Gradle supports Java, Groovy, and Scala out-of-the box. It also has other benefits like incremental builds, multi-project builds, and plugins for other essential development tools like Eclipse, Jenkins, SonarQube, Ivy, and Artifactory.

Grails is a full-stack framework based on Groovy, leveraging its concise syntax (which includes Closures), dynamic language programming, metaprogramming, and DSL support. The core principle of Grails is "convention over configuration". Grails also integrates well with existing and popular Java projects like Spring Security, Hibernate, and Sitemesh. Roo generates code at development time and makes use of AOP. Grails on the other hand generates code at run-time, allowing the developer to do more with less code. The scaffolding mechanism is very similar in Roo and Grails.

Grails has its own view technology called Groovy Server Pages (GSP) and its own ORM implementation called Grails Object Relational Mapping (GORM) which uses Hibernate under the hood. There is also decent support for REST/JSON and URL routing to controller actions. This makes it easy to use Grails together with Javascript MVC frameworks like AngularJS in creating more responsive user experiences based on the Single Page Application (SPA) architectural pattern.

There are many factors that can influence the decision to use Roo vs. Grails (e.g., the learning curve associated with Groovy and Grails for a traditional Java team). There is also a new high-productivity framework called Spring Boot that is emerging as part of the soon to be released Spring Framework 4.0.

Becoming Reactive

I am also interested in massively scalable and fault-tolerant systems. This is no longer a requirement solely for big internet players like Google, Twitter, Yahoo, and LinkedIn that need to scale to millions of users. These requirements (including response time and up time) are also essential in mission-critical applications such as healthcare.

The recently published "Reactive Manifesto" makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." That is the premise of the other two prominent languages on the JVM: Scala and Clojure. They are based on a different programming paradigm (than traditional OOP) called Functional Programming that is becoming very popular in the multi-core era.

Twitter uses Scala and has open-sourced some of their internal Scala resources like "Effective Scala" and "Scala School". One interesting framework based on Scala is Akka, a concurrency framework built on the Actor Model.

The Play Framework 2 is a full-stack web application framework based on Scala which is currently used by LinkedIn (which has over 225 millions registered users worldwide). In addition to its elegant design, Play's unique benefits include:

An embedded Java NIO (New I/O) non-blocking server based on JBoss Netty, providing the ability to call collaborating services asynchronously without relying on thread pools to handle I/O. This new breed of servers is called "Evented Servers" (NodeJS is another implementation) as opposed to the old "Threaded Servers". Older frameworks like Spring MVC use a threaded and synchronous approach which is more difficult to scale.
The ability to make changes to the source code and just refresh the browser page to see the changes (this is called hot reload).
Type-safe Scala templates (errors are displayed in the browser during development).
Integrated support for Akka which provides (among other benefits) fault-tolerance, the ability to quickly recover from failure.
Asynchronous responses (based on the concepts of "Future" and "Promise" also found in AngularJS), caching, iteratees (for processing large streams of data), and support for real-time push-based technologies like WebSockets and Server-Sent Events.

The biggest challenge in moving to Scala is that the move to Functional Programming can be a significant learning curve for developers with a traditional OOP background in Java. Functional Programming is not new. Languages like Lisp and Haskell are functional programming languages. More recently, XML processing languages like XSLT and XQuery have adopted functional programming ideas.

Bringing Clojure to the JVM

Clojure is a dialect of LISP and a dynamically-type functional programming language which compiles to JVM bytecode. Clojure supports multithreaded programming and immutable data structures. One interesting application of Clojure is Incanter, a statistical computing and data visualization environment enabling big data analysis on the JVM.

Sunday, April 28, 2013

How I Make Technology Decisions

The open source community has responded to the increasing complexity of software systems by creating many frameworks which are supposed to facilitate the work of developing software. Software developers spend a considerable amount of time researching, learning, and integrating these frameworks to build new software products. Selecting the wrong technology can cost an organization millions of dollars. In this post, I describe my approach to selecting these frameworks. I also discuss the frameworks that have made it to my software development toolbox.

Understanding the Business

The first step is to build a strong understanding of the following:

The business goals and challenges of the organization. For example, the healthcare industry is currently shifting to a value-based payment model in an increasingly tightening regulatory environment. Healthcare organizations are looking for a computing infrastructure that support new demands such as the Accountable Care Organization (ACO) model, patient-centered outcomes, patient engagement, care coordination, quality measures, bundled payments, and Patient-Centered Medical Homes (PCMH).

The intended buyers and users of the system and their concerns. For example, what are their pain points? which devices are they using? and what are their security and privacy concerns?

The standards and regulations of the industry.

The competitive landscape in the industry. To build a system that is relevant, it is important to have some ideas about the following: what is the competition? what are the current capabilities of their systems? what is on their road map? and what are customers saying about their products. This knowledge can help shape a Blue Ocean Strategy.

Emerging trends in technologies.

This type of knowledge comes with industry experience and a habit of continuously paying attention to these issues. For example, on a daily basis, I read industry news as well as scientific and technical publications. As a member of the American Medical Informatics Association (AMIA), I receive the latest issue of the Journal of the American Medical Informatics Association (JAMIA) which allows me to access cutting-edge research in medical informatics. I speak at industry conferences when possible and this allows me not only to hone my presentation skills, but also attend all sessions for free or at a discounted price. For the latest in software development, I turn to publications like InfoQ, DZone, and TechCrunch.

To better understand the users and their needs and concerns, I perform early usability testing (using sketches, wireframes, or mockups) to test design ideas and obtain feedback before actual development starts. For generating innovative design ideas, I recommend the following book: Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions by Bruce Hanington and Bella Martin.

Architecting the Solution

Armed with a solid understanding of the business and technological landscape as well as the domain, I can start creating a solution architecture. Software development projects can be chaotic. Based on my experience working on many software development projects across industries, I found that Domain Driven Design (DDD) can help foster a disciplined approach to software development. For more on my experience with DDD, see my previous post entitled How Not to Build A Big Ball of Mud, Part 2.

Frameworks evolve over time. So, I make sure that the architecture is framework-agnostic and focused on supporting the domain. This allows me to retrofit the system in the future with new frameworks as they emerge.

Due Diligence

Software development is a rapidly evolving field. I keep my eyes on the radar and try not to drink the vendors Kool-Aid. For example, not all vendors have a good track record in supporting standards, interoperability, and cross-platform solutions.

The ThoughtWorks Technology Radar is an excellent source of information and analysis on emerging trends in software. Its contributors include software thought leaders like Martin Fowler and Rebecca Parson. I also look at surveys of the developers community to determine the popularity, community size, and usage statistics of competing frameworks and tools. Sites like InfoQ often conduct these types of surveys like the recent InfoQ survey on Top JavaScript MVC Frameworks. I also like Matt Raible's Comparing JVM Web Frameworks.

I value the opinion of recognized experts in the field of interest. I read their books, blogs, and watch their presentations. Before formulating my own position, I make sure that I read expert opinions on opposing sides of the argument. For example, in deciding on a pure Java EE vs. Spring Framework approach, I read arguments by experts on both sides (experts like Arun Gupta, Java EE Evangelist at Oracle and Adrian Colyer, CTO at SpringSource).

Finally, consider a peer review of the architecture using a methodology like the Architecture Tradeoff Analysis Method (ATAM). Simply going through the exercise of explaining the architecture to stakeholders and receiving feedback can significantly help in improving it.

Rapid Prototyping

It's generally a good idea to create a rapid prototype to quickly learn and demonstrate the capabilities and value of the framework to the business. This can also generate excitement in the development team, particularly if the framework can enhance the productivity of developers and make their life easier.

The Frameworks I've Selected

The Spring Framework

I am a big fan of the Spring Framework. I believe it is really designed to support the need of developers from a productivity standpoint. In addition to dependency injection (DI), Aspect Oriented Programming (AOP), and Spring MVC, I like the Spring Data repository abstraction for JPA, MongoDB, Neo4J, and Hadoop. Spring supports Polyglot Persistence and Big Data today. I use Spring Roo for rapid application development and this allows me to focus on modeling the domain. I use the Roo scaffolding feature to generate a lot of Spring configuration and Java code for the domain, repository (Roo supports JPA and MongDB), service, and web layers (Roo supports Spring MVC, JSF, and GWT). Spring also support for unit and integration testing with the recent release of Spring MVC Test.

I use Spring Security which allows me to use AOP and annotations to secure methods and supports advanced features like Remenber Me and regular expressions for URLs. I think that JAAS is too low-level. Spring Security allows me to meet all OWASP Top Ten requirements (see my previous post entitled Application-Level Security in Health IT Systems: A Roadmap).

Spring Social makes it easy to connect a Spring application to social network sites like Facebook, Twitter, and LinkedIn using the OAuth2 protocol. From a tooling standpoint, Spring STS supports many Spring features and I can deploy directly to Cloud Foundry from Spring STS. I look forward to evaluating Grails and the Play Framework which use convention over configuration and are built on Groovy and Scala respectively.

Thymeleaf, Twitter Boostrap, and JQuery

I use Twitter Boostrap because it is based on HTML5, CSS3, JQuery, LESS, and also supports a Responsive Web Design (RWD) approach. The size of the components library and the community is quite impressive.

Thymeleaf is an HTML5 templating engine and a replacement for traditional JSP. It is well integrated with Spring MVC and supports a clear division of labor between back-end and front-end developers. Twitter Boostrap and Thymeleaf work well together.

AngularJS

For Single Page Applications (SPA) my definitive choice is AngularJS. It provides everything I need including a clean MVC pattern implementation, directives, view routing, Deep Linking (for bookmarking), dependency injection, two-way databinding, and BDD-style unit testing with Jasmine. AngularJS has its own dedicated debugging tool called Batarang. There are also several learning resources (including books) on AngularJS.

Check this page comparing the performance of AngulaJS vs. KnockoutJS. This is a survey of the popularity of Top JavaScript MVC Frameworks.

D3.js

D3.js is my favorite for data visualization in data-intensive applications. It is based on HTML5, SVG, and Javascript. For simple charting and plotting, I use jqPlot which is based on JQuery. See my previous post entitled Visual Analytics for Clinical Decision Making.

R

I use R for statistical computing, data analysis, and predictive analytics. See my previous post entitled Statistical Computing and Data Mining with R.

Development Tools

My development tools include: Git (Distributed Version Control), Maven or Gradle (build), Jenkins (Continuous Integration), Artifactory (Repository Manager), and Sonar (source code quality management). My testing toolkit includes Mockito, DBUnit, Cucumber JVM, JMeter, and Selenium.

Wednesday, September 22, 2010

The Future of Healthcare Data Exchange Standards

Meaningful Use Final Rule has finally been released and I think now is a good time to start thinking about where we want to be five years from now in terms of healthcare data exchange standards.

Listening to the Concerns of Implementers

I think it is very important that we listen to the concerns of the implementers of the current set of standards. They are the users of those standards and good software engineers like to get feedback from their end users to fix bugs and improve their software. The following post details some of the concerns out there regarding the current HL7 v3 XML schema development methodology and I believe they should not be ignored: Why Make It Simple If It Can Be Complicated?

Using an Industry Standard XML Schema: A Developer's Perspective

XML documents are not just viewed by human eyeballs through the use of an XSLT stylesheet. The XML schema has become an important part of the service contract in Service Oriented Architecture (SOA). SOA has emerged during the last few years as a set of design principles for integrating applications within and across organizational boundaries.

In the healthcare sector for example, the Nationwide Health Information Network (NHIN) and many Health Information Exchanges (HIEs) are being built on a decentralized service-oriented architecture using web services standards such as SOAP, WSDL, WS-Addressing, MTOM, and WS-Policy. The Web Services Interoperability (WS-I) Profiles WS-I Basic and WS-I Security provide additional guidelines that should be followed to ensure cross-platform interoperability for example between .NET and Java EE platforms. Some of the constraints defined by the WS-I Basic Profile are related to the design of XML schemas used in web services.

An increasingly popular alternative to the WS-* stack is to use RESTful web services. The REST architectural style does not mandate the use of web services contract such as XML schema, WSDL, and WS-Policy. However, the web application description language (WADL) has been proposed to play the role of service contract for RESTful web services. This post will not engage in the SOAP vs. REST debate except to mention that both are used in implementation projects today.

On top of these platform-agnostic web services standards, each platform defines a set of specifications and tooling for building web services applications. In the Java world, these specifications include:

The Java API for XML Web Services (JAX-WS)
The Java Architecture for XML Binding (JAXB)
The Java API for RESTful Web Services (JAX-RS)
The Streaming API for XML (StAX).

JAX-WS and JAXB allow developers to generate a significant amount of Java code from the WSDL and XML schema with tools like WSDL2Java. The quality of a standard XML schema largely depends on how well it supports the web services development process and that's why I believe that creating a reference implementation should be a necessary step before the release of new standards. An industry standard XML schema that is hard to use will directly translate into high implementation cost resulting from development project delays for example.

Embracing Design Patterns

Beyond our personal preferences (such as the NIEM vs. HL7 debate), there are well established engineering practices and methodologies that we can agree on. In terms of software development, design patterns have emerged as a well known approach to building effective software solutions. For example, the following two books have had a strong influence in the fields of object-oriented design and enterprise application integration respectively (and they sit proudly on my bookshelf):

Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions by Gregor Hohpe and Bobby Woolf.

An interesting design pattern from the "Enterprise Integration Patterns" book that is relevant to the current discussion on industry standard XML schemas is the "Canonical Data Model" design pattern. Enterprise data architects tasked with creating such canonical data models often reuse components from industry standard XML schemas. That approach makes sense but cannot succeed if the industry standard XML schema is not designed to support reusability, extensibility, and a clearly specified versioning strategy.

Modeling Data In Transit vs. Data at Rest

Modeling data at rest (e.g. data stored in relational databases) is a well established discipline. For example, data modeling patterns for relational data have been captured by Len Silverston and Paul Agnew in their book entitled "The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling".

There is a need to apply the same engineering rigor to modeling data in transit (e.g. data in web services messages). The XML Schema specification became a W3C Recommendation more than 9 years ago and I think there is now enough implementation experience to start building consensus around a set of XML Schema Design Patterns. The latter should address the following issues:

Usability: the factors that affect the ability of an average developer to quickly learn and use an XML schema in a software development project
Component Reusability
Web services cross-platform interoperability constraints. Some of those constraints are defined by the WS-I Basic Profile
Issues surrounding the use of XML databinding tools such as JAXB. This is particularly important since developers use those tools for code generation in creating web services applications. It is well known that existing databinding tools do not provide adequate support for all XML Schema language features
Ability to manipulate instances with XML APIs such as StAX
Schema extensibility, versioning, and maintainability.

These design patterns should be packaged into a Naming and Design Rules (NDR) document to ensure a consistent and proven approach to developing future XML vocabularies for the healthcare domain.

The XML Schema 1.1 specification is currently a W3C Candidate Recommendation. It defines new features such as conditional type assignments and assertions which allow schema developers to consolidate structural and business rules constraints into a single schema. This could help alleviate some of the pain associated with the multiple layers of Schematron constraints currently specified by HITSP C32, IHE PCC, and the HL7 CCD (sometimes referred to as the "HITSP Onion"). Saxon supports some of these new features.

Developing Standards the Way We Develop Software

The final point I'd like to make is that we should start creating healthcare standards the same way we develop software. I am a proponent of agile development methodologies such as Extreme Programming and Scrum. These methodologies are based on practices such as user stories, iteration (sprint) planning, unit test first, refactoring, continuous integration, and acceptance testing. Agile programming helps create better software and I believe it can help create better healthcare standards as well.

Wednesday, June 9, 2010

Data Modeling for Electronic Health Records (EHR) Systems

Getting the data model right is of paramount importance for an Electronic Health Records (EHR) system. The factors that drive the data model include but are not limited to:

Patient safety
Support for clinical workflows
Different uses of the data such as input to clinical decision support systems
Reporting and analytics
Regulatory requirements such as Meaningful Use criteria.

Model First

Proven methodologies like contract-first web service design and model driven development (MDD) put the emphasis on deriving application code from the data model and not the other way around. Thousands of line of code can be auto-generated from the model, so it's important to get the model right.

Requirements Gathering

The objective here is to determine the entities, their attributes, and the relationships between those entities. For example, what are the attributes that are necessary to describe a patient's condition and how do you express the fact that a condition is a manifestation of an allergy? The data modeler should work closely with clinicians to gather those requirements. Industry standards should be leveraged as well. For example, HITSP C32 defines the data elements for each EHR data module such as conditions, medications, allergies, and lab results. These data elements are then mapped to the HL7 Continuity of Care Document (CCD) XML schema.

The HL7 CCD is itself derived from the HL7 Reference Information Model (RIM). The latter is expressed as a set of UML class diagrams and is the foundation model for health care and clinical data. A simpler alternative to the CCD is the ASTM Continuity of Care Records (CCR). Both the CCD and CCR provide an XML schema for data exchange and are Meaningful Use criteria. Another relevant data model is the HL7 vMR (Virtual Medical Record) which aims to define a data model for the input and output of Clinical Decision Support Systems (CDSS).

These standards can be cumbersome to use as such from a software development perspective. Nonetheless, they can inform the design of the data model for an EHR system. Alignment with the CCD and CCR will facilitate data exchange with other providers and organizations. The following are Meaningful Use criteria for data exchange:

Electronically receive a patient summary record, from other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures and upon receipt of a patient summary record formatted in an alternative standard specified in Table 2A row 1, displaying it in human readable format.
Enable a user to electronically transmit a patient summary record to other providers and organizations including, at a minimum, diagnostic test results, problem list, medication list, medication allergy list, immunizations, and procedures in accordance with the standards specified in Table 2A row 1.

Applying Data Modeling Patterns

Applying data modeling patterns allows model consistency and quality. Relational data modeling is a well established discipline. My favorite resource for relational data modeling patterns is: The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling.

Some XML Schema best practices can be found here.

Data Stores

Today, options for data store are no longer limited to relational databases. Alternatives include: native XML databases (e.g. DB2 pureXML), Entity-Attribute-Value with Classes and Relationships (EAV/CR), and Resource Description Framework (RDF) stores.

Native XML databases are more resilient to schema changes and do not require handling the impedance mismatch between XML documents, Java objects, and relational tables which can introduce design complexity, performance, and maintainability issues.

Storing EHRs in an RDF store can enable the inference of medical facts based on existing explicit medical facts. Such inferences can be driven by an ontology expressed in OWL or a set of rules expressed in a rule language such SWRL. Semantic Web technologies can also be helpful in checking the consistency of a model, data and knowledge integration across domains (e.g. the genomics and clinical domains), and for managing classification schemes like medical terminologies. RDF, OWL, and SWRL have been successfully implemented in Clinical Decision Support Systems (CDSS).

The data modeling notation used should be independent of the storage model or at least compatible with the latter. For example, if native XML storage is used, then a relational modeling notation might not be appropriate. In general, UML provides the right level of abstraction for implementation-agnostic modeling.

Due Diligence

When adopting a "noSQL" storage model, it is important to ensure that (a) the database can meet performance and scalability criteria and (b) the team has the skills to develop and maintain the database. Due diligence should be performed through benchmarking using a tool such as the IBM Transaction Processing over XML (TpoX). The team might need formal training in a new query language like XQuery or SPARQL.

A Longitudinal View of the Patient Health

Maintaining an up-to-date and truly longitudinal view of a patient's medical history requires merging and reconciling data from heterogeneous sources including providers' EMR systems, lab companies, medical devices, and payers' claim transaction repositories. The data model should facilitate the assembly of data from such diverse sources. XML tools based on XSLT, XQuery, or XQuery Update can be used to automate the merging.

The Importance of Data Validation

Data validation can be performed at the database layer, the application layer, and the UI layer. The data model should support the validation of the data. The following are examples of techniques that can be used for data validation:

XML Schema for structural validation of XML documents
ISO Schematron (based on XPath 2.0 and XSLT 2.0) for business rules validation of XML documents
A business rules engine like Drools
A data processing framework like Smooks
The validation features of a UI framework such as JSF2
The built-in validation features of the database.

The Future: Modeling with the NIEM IEPD

The HHS ONC issued an RFP for using the National Information Exchange Model (NIEM) Information Exchange Package Documentation (IEPD) process for healthcare data exchange. The ONC will release a NIEM Concept of Operations (ConOps). The NIEM IEPD process is explained here.

Tuesday, August 11, 2009

Adding Semantics to SOA

What can Semantic Web technologies such as RDF, OWL, SKOS, SWRL, and SPARQL bring to Web Services. One of the most difficult challenges of SOA is data model transformation. This problem occurs when services don't share a canonical XML schema. XML transformation languages such as XSLT and XQuery are typically used for data mediation in such circumstances.

While it is relatively easy to write these mappings, the real difficulty lies in mapping concepts across domains. This is particularly important in B2B scenarios involving multiple trading partners. In addition to proprietary data models, it is not uncommon to have multiple competing XML standards in the same vertical. In general, these data interoperability issues can be syntactic, structural, or semantic in nature. Many SOA projects can trace their failure to those data integration issues.

This is where semantic web technologies can add significant value to SOA. The Semantic Annotations for WSDL and XML Schema (SAWSDL) is a W3C recommendation which defines the following extension attributes that can be added to WSDL and XML Schema components:

The modelReference extension attribute associates a WSDL or XML Schema component to a concept in a semantic model such as OWL. The semantic representation is not restricted to OWL (for example it could be an SKOS concept). The modelReference extension attribute is used to annotate XML Schema type definitions, element and attribute declarations as well as WSDL interfaces, operations, and faults.
The liftingSchemaMapping and loweringSchemaMapping extension attributes typically point to an XSLT or XQuery mapping file for transforming between XML instances and ontology instances.

A typical example of how SAWSDL might be used is in an electronic commerce network where trading partners use various standards such as EDI, UBL, ebXML, and RosettaNet. In this case, the modelReference extension attribute can be used to map a WSDL or XML Schema component to a concept in a common foundational ontology such as one based on the Suggested Upper Merged Ontology (SUMO). In addition, lifting and lowering XSLT transforms are attached to XML Schema components in the SAWSDL with liftingSchemaMapping and loweringSchemaMapping extension attributes respectively. Note that any number of those transforms can be associated with a given XML schema component.

Traditionally, when dealing with multiple services (often across organizational boundaries), an Enterprise Services Bus (ESB) provides mediation services such as business process orchestration, business rules processing, data format and data model transformation, message routing, and protocol bridging. Semantic mediation services can be added as a new type of ESB service. The SAWSDL4J API defines an object model that allows SOA developers to access and manipulate SAWSDL annotations.

Ontologies have been developed for some existing e-commerce standards such as EDI X12, RosettaNet, and ebXML. When required, ontology alignment can be achieved with OWL constructs such as subClassOf , equivalentClass , and equivalentProperty.

Semantic annotations provided by SAWSDL can also be leveraged in orchestrating business processes using the business process execution language (BPEL). To facilitate service discovery in SOA Registries and Repositories, interface definitions in WSDL documents can be associated with a service taxonomy defined in SKOS. In addition, once an XML message is lifted to an ontology instance, the data in the message becomes available to Semantic Web tools like OWL and SWRL reasoners and SPARQL query engines.

Sunday, April 26, 2009

Thoughts on SOAP vs. REST

REST is now an increasingly popular architectural style for building web services. The question for developers is: should REST always be the preferred mechanism for building web services or is SOAP still relevant for certain use cases?

In my opinion, REST is usually a no-brainer when you are exposing a public API over the internet and all you need is basic CRUD operations on your data. However, when designing a truly RESTful web services interface (as opposed to some HTTP API), care must be taken to adhere to key principles:

Everything is a URI addressable resource
Representations (media types such as XHTML, JSON, RDF, and Atom) describe resources and use links to describe the relationships between those resources
These links drive changes in application state (hence Representational State Transfer or REST)
The only type that is significant for clients is the representation media type
URI templates as opposed to fixed or hard coded resource names
Generic HTTP methods (no RPC-style overloaded POST)
Statelessness (the server keeps no state information)
Cacheability.

Adherence to these principles is what enables massive scalability. One good place to start is the AtomPub protocol which embodies these principles. In the Java space, the recently approved Java API for RESTful Web Services (JAX-RS) specification greatly simplifies REST development with simple annotated POJOs.

Within the enterprise and in B2B scenarios, SOAP (and its WS-* family of specifications) is still very attractive. This is not to say that REST is not enterprise ready. In fact, there are known successful RESTful implementations in mission critical applications such as banking. However, enterprise applications can have specific requirements in the areas of security, reliable messaging, business process execution, and transactions for which SOAP, the WS-* specifications, and supporting tools provide solutions.

These specifications include:

WS-Addressing
WS-Policy
WS-ReliableMessaging
WS-SecureConversation
WS-Security
WS-SecurityPolicy
WS-Trust
WS-AtomicTransaction
WS-BPEL (Business Process Execution Language)

RESTafarians will tell you that REST can handle these requirements as well. For example, RESTful transactions can be implemented by treating the transactions themselves as URI addressable REST resources. This approach can work, but is certainly not trivial to implement. In fact, it is often difficult to support some of these requirements without resorting to overloaded POST, which works more like SOAP and is a clear departure from a pure REST architectural style.

One characteristic of enterprise SOA is the need to expose pieces of application logic (as opposed to data) as web services and this can be more amenable to a SOAP-based approach. Existing SOAP web services toolkits such as Apache CXF provide support for WS-* specifications. More importantly, they greatly simplify the development process by providing various tools such as the ability to create new services with a contract-first approach where JAX-WS annotated services and server stubs can be automatically generated from an existing WSDL.

Furthermore, during the last ten years, organizations have made significant investments in SOAP-based infrastructure such as Enterprise Service Buses (ESBs) and Business Process Management (BPM) software based on WS-BPEL. The Content Management Interoperability Services (CMIS) specification which is currently being developed by OASIS specifies protocol bindings for both SOAP and AtomPub. The SOAP binding will allow organizations to leverage those investments in building interoperable content repositories.

Architecting an SOA solution is a balancing act. It's important not to dismiss any particular approach too soon. Both SOAP and REST should carefully be considered for new web services projects.

Thursday, October 23, 2008

AtomPub Use Cases in the Aviation Industry

At the XML 2007 Conference in Boston, I introduced my concept of an Integrated Documentation Environment for Aircraft Support (IDEAS) based on AtomPub and OpenSearch. The following are some use cases that illustrate how such an approach could facilitate integration and technical information publishing in the aviation industry:

Notification. Nancy is an aircraft mechanic. When she gets to work in the morning, she opens her feeds aggregator to get new and updated content from all of the following sources: airframer, engine manufacturer, component manufacturers, FAA, or airline policies. Essentially, Nancy doesn't want to login into the support sites of all those content providers to find out what is new and updated. She wants content pushed to her instead. This use case is implemented with the Atom syndication format.

Federated search. While Nancy is repairing the hydraulic tank, she wants to perform a single search against all those content repositories. She wants the results aggregated and returned to her as Atom entries, so that she can subscribe to those items that she is interested in and receive updates via web feeds. This use case is implemented with the OpenSearch specification.

Airline Originated Changes. Judy is an engineer working on a new engineering order (EO) to be performed on the hydraulic tank. The airline's technical documents are hosted by the aircraft manufacturer. Judy uses an XML editor which is also an AtomPub client to post the EO to the remote content repository (an AtomPub server).

Distributed Aircraft Manufacturing. Future Composites Inc. is a supplier of composite aircraft structures to X-Aero, a major airframer (systems integrator). Future Composites is also responsible for providing technical content in S1000D to X-Aero on those composite structures. After a failed attempt to connect to X-Aero's repository using their SOAP and WS-* interface, Future Composites and X-Aero mutually agree to go back to the basics and use AtomPub and its simple and generic RESTfull HTTP interface to CRUD (create, retrieve, update, and delete) documents to X-Aero's content repository.

The main argument in favor of this approach is simplicity and scalability. I am glad to see that the software industry is moving in that direction. Having been involved in complex WS-*-based integration projects in the airline industry, I believe this new approach is a breath of fresh air. The RESTful approach is also more amenable to agile software development as opposed to the waterfall approach which is typical when the big up front purchase of a proprietary ESB is involved.

Integration projects are becoming critical to the success of new aircraft projects. Speaking about the repeated postponement of the 787 maiden flight in an internal memo send to Boeing employees on April 21, 2008 (and obtained by the Seattle Times) Boeing CEO Jim McNerney wrote:
I expect we’ll modify our approach somewhat on future programs—possibly drawing the lines in different places with regard to what we ask our partners to do, but also sharpening our tools for overseeing overall supply chain activities.

Why AtomPub specifically? Because too many people have been putting the "REST" label on their unRESTful chef-d'oeuvre (HTTP APIs) lately. AtomPub is a good embodiment of the principles of the REST architectural style and a good place to start.

So, what are the key principles of RESTful design:

Everything is a URI addressable resource
Representations (media types such as XHTML, JSON, and Atom) describe resources and use links to describe the relationships between those resources.
These links drive changes in application state (hence Representational State Transfer or REST).
The only type that is significant for clients is the representation media type, not any other resource type
URI templates (a la OpenSearch) as opposed to fixed or hard coded resource names
Generic HTTP methods (no RPC-style overloaded POST)
Stalessness (the server keeps no state information)
Cacheability

Adherence to these principles is what drives massive scalability. Security in a RESTful application can be achieved with any of the following existing solutions:

XML Signature and Encryption
OpenID
HTTP Authentication
SSL

How does the aviation industry gets started with this new approach? This will require leadership from aviation IT specialists, particularly from original equipment manufacturers (OEMs). I don't think that another Air Transport Association (ATA) standard committee is needed. Such standard committees are plagued by vendor politics. By the time they finish their work, someone may have invented a better solution than AtomPub.

In the Java space, the recently approved Java API for RESTful Web Services (JAX-RS) specification greatly simplifies REST development with simple annotated POJOs. Jersey is the open source reference implementation of JAX-RS. Apache Abdera is an AtomPub implementation with Spring Framework integration. The latest release of Abdera features a collection of pre-bundled Atom Publishing Protocol adapters for JDBC, JCR, and filesystems.

The following is an excellent technical article from InfoQ that explains how REST and AtomPub facilitate integration: "How to Get a Cup of Coffee".

Monday, August 11, 2008

Recession-Proof Computing

The past three years have been very exciting at Efasoft. We've delivered value to a number of customers in a variety of industries including automotive, pharmaceutical, homeland security, wireless internet, aerospace, defense, insurance, and customer loyalty management. We've also learned a lot in the process. These projects have allowed us to strengthen our expertise in XML, Java EE, and SOA. In the aerospace vertical where we have a strong expertise, we took the initiative to propose new ideas such as using ISO Schematron to exchange and validate S1000D business rules and the AtomPub protocol and Atom syndication for the efficient exchange of up-to-date aircraft technical data between airlines and aerospace manufacturers.

Going forward, our objective will continue to be the success of our customers. We'll achieve that by researching and implementing best practices as always. We understand that technology must be aligned with strategic business goals, but should also take into consideration the context within which the business operates.

So, key questions that a lot of business leaders are now asking include:

How can we continue to invest in much needed strategic IT initiatives in the current economic downturn under tight budgets?
Given the high rate of IT project failures, how do we minimize risk?
Which software development methodology can help us deliver quality software on time and under budget?
What are the tools that we can use to help our developers in their work and keep them productive?
Is outsourcing and offshoring the right approach? And if we do outsource, how do we keep control, quality, and our intellectual property?

Open source software (OSS) is the answer to some of these questions. OSS lowers the total cost of ownership (TCO). By providing full access to the source code (including unit and functional tests), OSS provides transparency into enterprise software. By supporting standards and open frameworks, OSS allows organizations to avoid vendor lock-in, protect their investments, and find talent in the open job market to maintain and support their software assets in the future (in case the software vendor goes out of business).

SOA and Web 2.0 technologies allow organizations to gain a competitive advantage by supporting business process efficiency and by facilitating collaboration and online communities.

On the SOA front, OSS tools such as Apache CXF (web services framework), Apache Tucsany (SCA implementation), Intallio BPMS (BPMN), Apache Axis2, Mule ESB, Apache ODE (BPEL), and Apache ServiceMix (ESB) have demonstrated their strength in supporting SOA projects in mission critical applications in industries such as banking. Based on carefully researched SOA design principles and patterns, our SOA offering includes the following:

Business process analysis using BPMN
A model driven development (MDD) approach where appropriate
SOA implementation using emerging standards such as BPEL and SCA (Service Component Architecture)
SOA Governance using open source SOA Repositories.

On the Web 2.0 front, we really like the Liferay enterprise portal and the Alfresco document/web content management platforms particularly their built-in social networking features which enable enterprise collaboration and online communities. Document management is one area where we can leverage our expertise in XML and related technologies (XInclude, XSLT, XQuery, XForms, ISO Schematron, and S1000D) to help our customers bring their knowledge assets under control. We'll continue to support the Exist XQuery-enabled native XML database to build dynamic XML content applications. The XRX (XForms, REST, XQuery) architecture with Exist and the Orbeon XForms engine enables what we call "Web 2.0 XML authoring and Publishing". We've acquired a strong expertise in document management for maintenance and operation documentation in the aerospace industry (our traditional forte) and drug related documentation in the pharmaceutical industry.

JBoss Seam is a very compelling application development framework because it not only brings together Java EE frameworks such as Hibernate, JPA, EJB 3, Spring, JSF, Facelets, and Java portlets, but also integrates human workflow capabilities (jBPM), full-text search (Hibernate Search), a business rules engine (Drools), and an integration testing facility. We like the ability to leverage third party AJAX-enabled JSF component libraries such as ICEFaces and Apache MyFaces to quickly create rich internet applications (RIA). However, JBoss Seam is not limited to JSF and can also integrate Flex 3 front-ends.

Going forward, all these open source software will be part of our toolkit as we craft innovative software solutions for our customers.

At Efasoft, we are proponents of agile development methodologies such as Extreme Programming and Scrum. These methodologies are based on practices such as user stories, iteration (sprint) planning, pair programming, unit test first, refactoring, continuous integration, and acceptance tests. Agile programming helps create better software that is also easier to maintain. We've witnessed the success of agile first hand and believe that it can help IT organizations achieve success. For more on Efasoft's approach to quality, see my previous blog Addressing Software Quality Head-On.

Saturday, July 26, 2008

Architecting SOA Solutions with a Model Driven Development (MDD) Approach

How do you architect an SOA solution to ensure that it is driven by the business and can respond rapidly and efficiently to ever changing business requirements? A Model Driven Development (MDD) approach can help provide that level of agility. The goal with the MDD approach to SOA is to auto-generate service artifacts such as WSDL, XSD, SCA composites, and BPEL code from the service model.

First, the business articulates their vision for the SOA project in requirements documents or in the form of use cases. Business analysts (BAs) then model business processes that realize the use cases by leveraging the Business Process Modeling Notation (BPMN). With the help of the right tools, the BAs can specify Key Performance Indicators (KPI) such as those required by service level agreements (SLAs). They can also run simulations to validate the proposed business processes.

SOA is indeed all about reengineering and supporting organizational business processes. Back in 1993, Michael Hammer and James Champy made the case and outlined the management framework for reengineering in their book entitled "Reengineering the Corporation: A Manifesto for Business Revolution". Today, SOA is the software architecture that enables and facilitates the reengineering of business processes.

BPMN is an effective tool for BAs (as opposed to UML) because they should only focus on the business and operational aspects of the business process and shouldn’t have to worry about IT concerns such as service loose coupling, reusability, reliability, security, persistence, and transactions. While direct transformation from BPMN to executable BPEL code (so called BPMN-BPEL round tripping) may be effective for simple business processes, it can not always satisfy those IT concerns. More complex business processes will require advanced modeling and coding by SOA architects and developers.

For example, the SOA architect will have to decompose the proposed business process into task, entity, and utility service layers in order to satisfy the SOA principles of loose coupling, reusability, and composability. That will also give the SOA Architect opportunity to apply SOA design principles and patterns and check the enterprise SOA Repository or Registry to reuse existing services or legacy assets.

After decomposing the proposed business process to identify reuse opportunities and address other IT concerns, the SOA architect can then build an assembly of service components based on the Service Component Architecture (SCA). SCA implementation types can be Spring beans, EJBs, C++, Cobol, WS-BPEL, PHP, Spring, XSLT, XQuery, and OSGi bundles. SCA supports different bindings such as SOAP/HTTP Web services, JMS, RSS, and Atom.

The tool of choice for software architects is UML 2.0. In the case of SOA, UML can help abstract the service model from technology-specific implementation details. Basic UML artifacts such as activity and collaboration diagrams can be auto-generated from the BPMN diagrams produced by the BAs to bootstrap the SOA Architect's modeling effort.

To help SOA architects in crafting service-oriented solution logic, a UML Profile for service-oriented design should be adopted. The profile should define a number of stereotypes that can be applied to UML artifacts in order to refine the transformation from UML artifacts to service artifacts.

The automatic generation of the service artifacts from the UML model should be part of the build and continuous integration process which should also include automated tests (for example to ensure that the generated XSD and WSDL are syntactically correct, WSI-BP compliant, and backward compatible).

The benefits of an MDD approach to crafting SOA solutions include: increased development productivity, traceability to business requirements, responsiveness to changing requirements, quality, and overall agility.

Wednesday, July 9, 2008

SOA in the Java Space: State of the Union

In the Java space, an SOA Architect starting a new SOA project will have to make some strategic as well as tactical decisions regarding which approach and technologies are appropriate for their project. Of course, the SOA project should be aligned with the organization’s long term business goals.

Technically, there is a myriad of specifications to choose from:

The Business Process Modeling Notation (BPMN)
The Java API for XML Web Services (JAX-WS)
The WS-* specifications including the WS-I Basic Profile, WS-Addressing, WS-Policy, WS-Reliable Messaging, and WS-Security
The Java API for RESTful Web Services (JAX-RS)
The Java Business Integration (JBI)
The Web Services Business Process Execution Language (WS-BPEL)
The Service Component Architecture (SCA).

Each of these specifications has its raison d’etre and should be part of the architect’s toolkit. However, I find SCA quite intriguing. SCA defines a language neutral programming model for the assembly and deployment of services. SCA implementation types include Java, C++, Cobol, WS-BPEL, PHP, Spring, XSLT, XQuery, and OSGi bundles. SCA supports different bindings such as SOAP/HTTP Web services, JMS, RSS, and Atom. SCA applications can be hosted in web containers, application servers, and OSGi runtimes. SCA is geared toward the developer and can apply policies such as reliability, security, and transactions to services in a declarative manner. SCA has the support of big players including Oracle, SAP, and IBM. At the time of this writing, Sun Microsystems support for SCA is less than clear (to me anyway).

JBI is implemented by a number of Enterprise Service Bus (ESB) products. JBI defines a runtime architecture that allows plugins such as binding components and service engines to interoperate via a Normalized Message Router (NMR). Binding components (BCs) use communication protocols such as JMS, FTP, XMPP, and HTTP/S to connect external services to the JBI environment. Service engines provide application logic in the JBI environment. Examples of SEs are XSLT/XQuery data transformation engines, rules engines, and WS-BPEL engines. BCs and SEs do not communicate directly. They only communicate through the NMR. IBM, BEA (now part of Oracle) and SAP did not vote in favor of JBI Java Specification Request (JSR 208).

When starting a new SOA project, SOA architects will have to look beyond vendors politics and make some judgment calls about the best approach based on their business goals and functional requirements.

My personal take on this is to adopt an agile approach where new functionalities are implemented in an iterative manner. For example, instead of starting with an ESB infrastructure, a project can start by service enabling existing applications (code-first approach) with JAX-WS annotation capabilities or by creating new services with a contract-first approach where JAX-WS annotated service and server stub are generated from a WSDL. Alternatively, the Java API for RESTfulWeb Services (WS-RS) could be used when a RESTful approach seems more appropriate.

An ESB could later come into the picture in a context where you are connecting to multiple services (often across organizational boundaries) and there is a need for mediation services such as business process orchestration, business rules processing, data model transformation, message routing, and protocol bridging. In that context, JBI provides plug-and-play functionality in ESBs for service engines such as business rules engines and BPEL engines. For that reason, JBI can help avoid ESB vendor lockin (perhaps a reason why proprietary ESB vendors are not backing JBI).

However, SOA architects should carefully consider the benefits of the programming language and binding agnostic service assembly model proposed by SCA. This will be essentially a choice between centralized mediation and decentralized assembly. The Open Service Oriented Architecture (OSOA) group believes that JBI and SCA can actually work together for example to allow SCA components to call JBI components or to use JBI runtime containers to deploy SCA composites.

Decentralized assembly is agile and looks a lot more like the way the web itself works. So I believe that while the JBI model is fine for integrating legacy enterprise applications, new and future service-oriented applications will embrace the SCA approach.

So is the state of the union strong? There is certainly a risk of fragmentation. But choice will always drive innovation forward and that’s what attracts me to the Java platform in the first place.

Sunday, May 4, 2008

SOA and ROA Design Principles and Patterns

I've been compiling a list of design patterns and anti-patterns on Service Oriented Architecture (SOA) and Resource Oriented Architecture (ROA). I find the following resources quite useful.

If you're looking for design patterns in building RESTful applications, the best way to start is to look at the Atom Publishing Protocol (AtomPub) which is a good embodiment of the principles of the REST architectural style. The Google Data API (GData) is a real world implementation of AtomPub. At the XML 2007 Conference, I've proposed a RESTful approach to aviation technical data management called "Integrated Documentation Environment for Aircraft Support (IDEAS)" (more on that on my previous blog RESTful IDEAS).

Another good resource is the book "RESTful Web Services" by Leonard Richardson and Sam Ruby. Chapter 8 entitled "REST and ROA Best Practices" is a must read and also addresses potential REST implementation issues such as asynchronous operations and transactions. Chapter 10 entitled "The Resource-Oriented Architecture Versus Big Web Services" offers ROA alternatives to WS-* specifications.

For SOA design patterns and anti-patterns, here are some useful resources:

SOA Antipatterns: an article by a group of IBM SOA architects
SOA anti-Patterns: an article by Steve Jones at InfoQ
SOA Patterns: the first draft of Thomas Erl's upcoming book entitled "SOA Design Patterns"
Pete Lacey Criticizes Web Services: an interesting discussion on REST vs. WS-*.

I don't believe that ROA is the answer to all SOA project failures out there. However, I do believe that certain requirements and use cases are more amenable to the REST architectural style (more on that in a future post).

Sunday, April 13, 2008

SOA Governance Tools

First, an important caveat: SOA governance is not a tool or a product. SOA governance is about people and leadership. No tool will deliver good governance out the box if the human factor is not taken into consideration. However the right tool can facilitate and provide transparency into SOA governance.

It’s important to make a distinction between design-time SOA governance and run-time SOA governance. The main objective of run-time SOA governance is the enforcement of QoS and SLAs. Design-time governance focuses on the enforcement of industry-recognized SOA design principles and patterns.

The goal of these patterns is not to kill the creativity of SOA developers or police their work, but instead to avoid SOA anti-patterns that are known to undermine the success of SOA projects. For example, SOA developers cannot reuse services if they are not aware of the existence of these services in the enterprise. Even if they know these services exist and where to find them, they cannot reuse the services if they don't understand them. Therefore the ability to easily discover well specified service metadata is one good design principle that can help deliver on SOA's promise of service reuse across the enterprise.

One of the key aspects of design-time SOA Governance is the management of the lifecycle of service artifacts and the dependencies between them. This is accomplished through a new breed of tools called SOA Repositories/Registries. The following are what I consider important requirements for an SOA Repository/Registry.

Indexing of XML-formatted artifacts such as WSDL, XML Schemas, Schematron rules, XSLT transforms, Spring and Hibernate configuration files, data mapping specifications, WS-Policy documents, etc. Ideally, users should be able to use languages such as XSLT and XQuery to manipulate artifacts and query the Repository/Registry. Requirements and specification documents should be stored in XML (as opposed to MS Word or Excel) if possible so that they can be processed the same way. The SOA Repository/Registry should sit on a native XQuery compliant database. This would provide powerful visualization and reporting capabilities to the registry. I should be able to run an XQuery search against the SOA Repository/Registry to return all artifacts that contain a reference to a certain XML element, so that I can visualize the impact that a change to that element would have. Automatic detection of certain dependencies (e.g. WSDL and XML Schemas) should be supported as well.

Policy enforcement is also important as artifacts are added to the registry. For example, Schematron can be used to enforce XML Schema best practices. WS-I Basic Profile compliance and XML schema backward compatibility may also need to be enforced.

Support for a RESTful API for all CRUD (create, read, update, delete) operations on artifacts. Ideally, I would prefer support for the AtomPub specification and the Atom syndication format for pushing updates to stakeholders. Competing SOA Registry protocols and APIs include: UDDI v3, the Java API for XML Registry (JAXR), JSR 170/283 (Java Content Repository API), and IBM WebSphere Registry and Repository (WSRR). However, Open source vendors such as MuleSource and WSO2 have adopted AtomPub for its simplicity, while RedHat is building its upcoming SOA Repository/Registry (JBoss DNA) on JSR 170. Mule Galaxy sits on the Apache Jackrabbit JCR reposiroty. The JSR 170 repository model could also be adopted in the Java space as a standardized repository model for SOA registries.

Of course authentication, authorization, audit trails, workflow, and versioning should be expected in any SOA Repository/Registry.

Sunday, July 22, 2007

RESTful IDEAS

About a year ago, I published a white paper entitled: "Beyond S1000D: an SOA Enabled Interoperability Framework for the Aerospace Industry".

The white paper proposed a framework called "Integrated Documentation Environment for Aircraft Support (IDEAS)" for the interoperability of enterprise content management and publishing systems within the aerospace industry. The goal was to allow new capabilities such as the remote access to library services, cross-repository exchange, cross-repository aggregation, and cross-repository observation.

Global aerospace organizations acquire technical publications from multiple suppliers and business partners. They must address the following challenges:

The elimination of the high costs associated with paper libraries and the shipping of physical products such as paper, CDs, and DVDs.

The safety and regulatory compliance concerns related to the slow distribution of supplements to field sites.

The need for a single point of access to the multitude of technical documentation needed to maintain and operate aerospace equipments.

The IDEAS concept was created to address current inefficiencies in technical data management processes within the industry by taking advantage of Service-Oriented Architecture (SOA) and emerging content management standards such as the JSR 170 Content Repository for Java Technology API.

One the Java EE platform, JSR 170 is enjoying a lot of success in terms of adoption and implementation. In the Open Source world, the Apache Jackrabbit project continues to evolve and there is now a Spring JSR 170 Module to simplify development with the very popular Spring Framework.

For cross-platform interoperability, SOA based solutions have traditionally relied on web services standards such as SOAP, WSDL, and UDDI. However, in today's Web 2.0 world, alternative approaches such as the Representational State Transfer (REST) architectural style and the OpenSearch specification (for federated searches) are getting a lot of attention for their simplicity and scalability.

REST is based on the notion that resources on the web are URI-addressable and that all CRUD (Create, Retrieve, Update and Delete) operations on those resources can be implemented through a generic interface (e.g., HTTP GET, POST, PUT, DELETE). In contrast, RPC-based mechanisms such as SOAP use many custom methods and expose a single or few endpoint URIs. It turned out that the requirements for interoperable enterprise content management systems are more amenable to the REST architectural style.

The resurgence of REST can be felt across the application development landscape. Struts 2 introduced a REST-style improvement to action mapping called Restful2ActionMapper (itself inspired by the REST support in Ruby on Rails). Support for RESTful web applications is been added to JSF through the RestFaces project. REST APIs are also easy to implement with scripting languages such as JavaScript and FreeMarker.

The technical documentation needed to operate and maintain an airline's fleet is supplied by several manufacturers including aircraft, engine, and component manufacturers. Regulatory agencies ( FAA and the NTSB) also publish documents such as Advisory Circulars (ACs), Airworthiness Directives (ADs), and various forms and regulations. If all these organizations expose their content repositories via OpenSearch, then an airline technician will be able to perform a federated search across all those repositories to obtain technical information about particular equipment. The results could be formatted in ATOM to allow the technician to receive updates via web feed.

To expose a library service with a REST-style API, a content management system would typically need to provide the following:

A description of the service including URI templates, HTTP method binding, authentication, transaction, response content types, and response status

The specification of the code (script or Java ) that is executed on the invocation of the URI

Response templates

JSR 311, the Java API for RESTful Web Services will define a set of Java APIs for the development of Web services built according to the REST architectural style.