Adventures in Computing: DDD

Showing posts with label DDD. Show all posts

Sunday, November 2, 2014

Toward a Reference Architecture for Intelligent Systems in Clinical Care

A Software Architecture for Precision Medicine

Intelligent systems in clinical care leverage the latest innovations in machine learning, real-time data stream mining, visual analytics, natural language processing, ontologies, production rule systems, and cloud computing to provide clinicians with the best knowledge and information at the point of care for effective clinical decision making. In this post, I propose a unified open reference architecture that combines all these technologies into a hybrid cognitive system for clinical decision support. Indeed, truly intelligent systems are capable of reasoning. The goal is not to replace clinicians, but instead to provide them with cognitive support during clinical decision making. Furthermore, Intelligent Personal Assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana have raised our expectations on how intelligent systems interact with users through voice and natural language.

In the strict sense of the term, a reference architecture should be abstracted away from concrete technology implementation. However in order to enable a better understanding of the proposed approach, I take liberty in explaining how available open source software can be used to realize the intent of the architecture. There is an urgent need for an open and interoperable architecture which can be deployed across devices and platforms. Unfortunately, this is not the case today with solutions like Apple's HealthKit and ResearchKit.

The specific open source software mentioned in this post can be substituted with other tools which provide similar capabilities. The following diagram is a depiction of the architecture (click to enlarge).

Clinical Data Sources

Clinical data sources are represented on the left of the architecture diagram. Examples include electronic medical record systems (EMR) commonly used in routine clinical care, clinical genome databases, genome variant knowledge bases, medical imaging databases, data from medical devices and wearable sensors, and unstructured data sources such as biomedical literature databases. The approach implements the Lambda Architecture enabling both batch and real-time data stream processing and mining.

Predictive Modeling, Real-Time Data Stream Mining, and Big Data Genomics

The back-end provides various tools and frameworks for advanced analytics and decision management. The analytics workbench includes tools for creating predictive models and data streaming mining. The decision management workbench includes a production rule system (providing seamless integration with clinical events and processes) and an ontology editor.

The incoming clinical data likely meet the Big Data criteria of volume, velocity, and variety (this is particularly true for physiological time series from wearable sensors). Therefore, specialized frameworks for large scale cluster computing like Apache Spark are used to analyze and process the data. Statistical computing and Machine Learning tools like R are used here as well. The goal is knowledge and patterns discovery using Machine Learning model builders like Decision Trees, k-Means Clustering, Logistic Regression, Support Vector Machines (SVMs), Bayesian Networks, Neural Networks, and the more recent Deep Learning techniques. The latter hold great promise in applications such as Natural Language Processing (NLP), medical image analysis, and speech recognition.

These Machine Learning algorithms can support diagnosis, prognosis, simulation, anomaly detection, care alerting, and care planning. For example, anomaly detection can be performed at scale using the k-means clustering machine learning algorithm in Apache Spark. In addition, Apache Spark allows the implementation of the Lambda Architecture and can also be used for genome Big Data analysis at scale.

In another post titled How Good is Your Crystal Ball?: Utility, Methodology, and Validity of Clinical Prediction Models, I discuss quantitative measures of performance for clinical prediction models.

Visual Analytics

Visual Analytics tools like D3.js, rCharts, ploty, googleVis, ggplot2, and ggvis can help obtain deep insight for effective understanding, reasoning, and decision making through the visual exploration of massive, complex, and often ambiguous data. Of particular interest is Visual Analytics of real-time data streams like physiological time series. As a multidisciplinary field, Visual Analytics combines several disciplines such as human perception and cognition, interactive graphic design, statistical computing, data mining, spatio-temporal data analysis, and even Art. For example, similar to Minard's map of the Russian Campaign of 1812-1813 (see graphic below), Visual Analytics can help in comparing different interventions and care pathways and their respective clinical outcomes over a certain period of time by displaying causes, variables, comparisons, and explanations.

Production Rule System, Ontology Reasoning, and NLP

The architecture also includes a production rule engine and an ontology editor (Drools and Protégé respectively). This is done in order to leverage existing clinical domain knowledge available from clinical practice guidelines (CPGs) and biomedical ontologies like SNOMED CT. This approach complements machine learning algorithms' probabilistic approach to clinical decision making under uncertainty. The production rule system can translate CPGs into executable rules which are fully integrated with clinical processes (workflows) and events. The ontologies can provide automated reasoning capabilities for decision support.

NLP includes capabilities such as:

Text classification, text clustering, document and passage retrieval, text summarization, and more advanced clinical question answering (CQA) capabilities which can be useful for satisfying clinicians' information needs at the point of care; and
Named entity recognition (NER) for extracting concepts from clinical notes.

The data tier supports the efficient storage of large amounts of time series data and is implemented with tools like Cassandra and HBase. The system can run in the cloud, for example using the Amazon Elastic Compute Cloud (EC2). For real-time processing of distributed data streams, cloud-based solutions like Amazon Kinesis and Lambda can be used.

Clinical Decision Services

The clinical decision services provide intelligence at the point of care typically using deployed predictive models, clinical rules, text mining outputs, and ontology reasoners. For example, Machine Learning algorithms can be exported in predictive markup language (PMML) format for run-time scoring based on the clinical data of individual patients, enabling what is referred to as Personalized Medicine. Clinical decision services include:

Diagnosis and prognosis
Simulation
Anomaly detection
Data visualization
Information retrieval (e.g., clinical question answering)
Alerts and reminders
Support for care planning processes.

The clinical decision services can be deployed in the cloud as well. Other clinical systems can consume these services through a SOAP or REST-based web service interface (using the HL7 vMR and DSS specifications for interoperability) and single sign-on (SSO) standards like SAML2 and OpenID Connect.

Intelligent Personal Assistants (IPAs)

Clinical decision services can also be delivered to patients and clinicians through IPAs. IPAs can accept inputs in the form of voice, images, and user's context and respond in natural language. IPAs are also expanding to wearable technologies such as smart watches and glasses. The precision of speech recognition, natural language processing, and computer vision is improving rapidly with the adoption of Deep Learning techniques and tools. Accelerated hardware technologies like GPUs and FPGAs are improving the performance and reducing the cost of deploying these systems at scale.

Hexagonal, Reactive, and Secure Architecture

Intelligent Health IT systems are not just capable of discovering knowledge and patterns in data. They are also scalable, resilient, responsive, and secure. To achieve these objectives, several architectural patterns have emerged during the last few years:

Domain Driven Design (DDD) puts the emphasis on the core domain and domain logic and recommends a layered architecture (typically user interface, application, domain, and infrastructure) with each layer having well defined responsibilities and interfaces for interacting with other layers. Models exist within "bounded contexts". These "bounded contexts" communicate with each other typically through messaging and web services using HL7 standards for interoperability.

The Hexagonal Architecture defines "ports and adapters" as a way to design, develop, and test an application in a way that is independent of the various clients, devices, transport protocols (HTTP, REST, SOAP, MQTT, etc.), and even databases that could be used to consume its services in the future. This is particularly important in the era of the Internet of Things in healthcare.

Microservices consist in decomposing large monolithic applications into smaller services following good old principles of service-oriented design and single responsibility to achieve modularity, maintainability, scalability, and ease of deployment (for example, using Docker).

CQRS/ES: Command Query Responsibility Segregation (CQRS) and Event Sourcing (ES) are two architectural patterns which consist in the use of event-driven messaging and an Event Store for separating commands (write-side) from queries (read-side) relying on the principle of Eventual Consistency. CQRS/ES can be implemented in combination with microservices to deliver new capabilities such as temporal queries, behavioral analysis, complex audit logs, and real-time notifications and alerts.

Functional Programming: Functional Programming languages like Scala have several benefits that are particularly important for applying Machine Learning algorithms on large data sets. Like functions in mathematics, functions in Scala have no side effects. This provides referential transparency. Machine Learning algorithms are in fact based on Linear Algebra and Calculus. Scala supports high-order functions as well. Variables are immutable witch greatly simplifies concurrency. For all those reasons, Machine Learning libraries like Apache Mahout have embraced Scala, moving away from the Java MapReduce paradigm.

Reactive Architecture: The Reactive Manifesto makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." Leading frameworks that support Reactive Programming include Akka and RxJava. The latter is a library for composing asynchronous and event-based programs using observable sequences. RxJava is a Java port (with a Scala adaptor) of the original Rx (Reactive Extensions) for .NET created by Erik Meijer.

Based on the Actor Model and built in Scala, Akka is a framework for building highly concurrent, asynchronous, distributed, and fault tolerant event-driven applications on the JVM. Akka offers location transparency, fault tolerance, asynchronous message passing, and a non-deterministic share-nothing architecture. Akka Cluster provides a fault-tolerant decentralized peer-to-peer based cluster membership service with no single point of failure or single point of bottleneck.

Also built with Scala, Apache Kafka is a scalable message broker which provides high-throughput, fault-tolerance, built-in partitioning, and replication for processing real-time data streams. In the reference architecture, the ingestion layer is implemented with Akka and Apache Kafka.

Web Application Security: special attention is given to security across all layers, notably the proper implementation of authentication, authorization, encryption, and audit logging. The implementation of security is also driven by deep knowledge of application security patterns, threat modeling, and enforcing security best practices (e.g., OWASP Top Ten and CWE/SANS Top 25 Most Dangerous Software Errors) as part of the continuous delivery process.

An Interface that Works across Devices and Platforms

The front-end uses a Mobile First approach and a Single Page Application (SPA) architecture with Javascript-based frameworks like AngularJS to create very responsive user experiences. It also allows us to bring the following software engineering best practices to the front-end:

Dependency Injection
Test-Driven Development (Jasmine, Karma, PhantomJS)
Package Management (Bower or npm)
Build system and Continuous Integration (Grunt or Gulp.js)
Static Code Analysis (JSLint and JSHint), and
End-to-End Testing (Protractor).

For mobile devices, Apache Cordova can be used to access native functions when desired. The main goal is to provide a user interface that works across devices and platforms such as iOS, Android, and Windows Phone.

Interoperability

Interoperability will always be a key requirement in clinical systems. Interoperability is needed between all players in the healthcare ecosystem including providers, payers, labs, knowledge artifact developers, quality measure developers, and public health agencies like the CDC. These standards exist today and are implementation-ready. However, only health IT buyers have the leverage to demand interoperability from their vendors.

Standards related to clinical decision support (CDS) include:

The HL7 Fast Healthcare Interoperability Resources (FHIR)
The HL7 virtual Medical Record (vMR)
The HL7 Decision Support Services (DSS) specification
The HL7 CDS Knowledge Artifact specification
The DMG Predictive Model Markup Language (PMML) specification.

Overcoming Barriers to Adoption

In a previous post, I discussed a practical approach to addressing challenges to the adoption of clinical decision support (CDS) systems.

Sunday, January 12, 2014

Navigating in Scala Land

In a previous post titled Toward Polyglot Programming on the Java Virtual Machine (JVM), I described my preliminary exploration of the other languages and frameworks on the JVM including Groovy, Gradle, Grails, Scala, Akka, Clojure, and the Play Framework. I made the switch from Maven to Gradle, a Groovy-based build language that combines the best of Ant and Maven. I was seduced by the scaffolding capabilities of Grails, but decided to make the jump to Scala and functional programming. So I have been navigating in Scala land recently. In this post, I described my journey.

I am still learning, but I can tell you that I never had so much fun learning a new programming language. It probably has something to do with the use of pure mathematical functions in Scala. I did spend a year studying pure mathematics at the University of Abomey-Calavi after graduating from high school. My first exposure to functional programming was with XSLT and XQuery and I very much enjoyed programming without side effects when using those languages. XSLT 3.0 is a fully-fledged functional programming language with support for functions as first class values and high-order functions. XQuery 3.0 is a typed functional language for processing and querying XML data.

Scala is a complex and ambitious language as it supports both object-oriented and functional programming. When I started to learn Scala, I took the wrong directions several times and had to make several U-turns. So the following steps have been effective for me:

The Coursera class Functional Programming Principles in Scala taught by Martin Odersky (the designer of Scala) is a good place to start. It explains the motivations behind Scala and emphasizes its mathematical and functional nature without trying to map pre-existing knowledge (of Java or Python, or any other language) to Scala. This is a refreshing approach because Scala is a different language although it can interoperate with Java. A good companion to this course is the book Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition by Martin Odersky, Lex Spoon and Bill Venners.

If you're a Java programmer moving to Scala, then the book Scala for the Impatient by Cay S. Horstmann would be a good reference.

If you're interested in building a web application in Scala, then I would recommend the book Play for Scala by Peter Hilton, Erik Bakker and Francisco Canedo. Scala and Play come with their own ecosystem of tools. This includes a Scala-based build system called sbt (simple build tool), testing tools (like Spec2 and ScalaTest), IDEs (Eclipse-based Scala IDE and IntelliJ), database drivers (like ReactiveMongo and Slick), and authentication/authorization (SecureSocial and Deadbolt). Play has first-class support for JSON and REST, supports asynchronous responses (based on the concepts of "Future" and "Promise"), reactive programming with Akka, caching, iteratees (for processing large streams of data), and real-time push-based technologies like WebSockets and Server-Sent Events.

A this point, if you decide to dive into the deep waters of Scala, you might want to consider learning Reactive Programming and purely functional data structures.

There is a second Scala course at Coursera titled Principles of Reactive Programming taught by Martin Odersky, Erik Meijer, and Roland Kuhn. The Reactive Manifesto makes the case for a new breed of applications called "Reactive Applications". According to the manifesto, the Reactive Application architecture allows developers to build "systems that are event-driven, scalable, resilient, and responsive." In Scala land, there are currently two leading frameworks that support Reactive Programming: Akka and RxJava. The latter is a library for composing asynchronous and event-based programs using observable sequences. RxJava is a Java port (with a Scala adaptor) of the original Rx (Reactive Extensions) for .NET created by Erik Meijer. Based on the Actor Model, Akka is a framework for building highly concurrent, asynchronous, distributed, and fault tolerant event-driven applications on the JVM (it supports both Java and Scala).

For purely functional data structures, there is a Scala-based library called Scalaz. The book Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnason is a good resource for exploring Scalaz.

Readers of this blog probably know that I am a proponent of Domain Driven Design (DDD) in building complex software systems. So I have been investigated how DDD principles can be implemented with a functional and reactive approach. Vaughn Vernon recently presented a podcast on Reactive DDD with Scala and Akka. In a post titled Functional Patterns in Domain Modeling - Anemic Models and Compositional Domain Behaviors, Debasish Ghosh provides an interesting perspective on the subject of anemic domain models in DDD done within a functional programming language as opposed to an object-oriented one.

For me, Big Data is real, not just a buzzword. I believe in analyzing humongous amounts of data to find hidden patterns and obtain insight for solving complex problems. Dean Wampler called copious data, the killer app for functional programming. Scalding by Twitter can be used for writing MapReduce jobs in Scala. Apache Spark which is written in Scala can run Machine Learning programs up to 100x faster than Hadoop MapReduce in memory. A talk titled Why Spark is the Next Top Compute Model by Dean Wampler explains why Spark has emerged as the most likely replacement for MapReduce in Hadoop applications.

Sunday, October 20, 2013

Treating Javascript as a first-class language

With the emergence of the Single Page Application (SPA) architecture as an approach to creating more fluid and responsive user experiences in the browser, Javascript is gaining prominence as a platform for modern application development. Paypal, a large online payment service, announced recently that it has achieved significant performance and productivity gains by shifting its server-side development from Java to Javascript. From a software architecture and development perspective, what do expressions like "Javascript as a first-class language" or "Javascript as a platform" actually mean?

Let's consider a well-established first-class language and platform like Java. By the way, I still consider Java a strong and safe bet for developing applications. What makes Java strong is not just the language, but the rich ecosystem of free and open source tools and frameworks built around it (e.g., Eclipse, Tomcat, JBoss Application Server, Drools, Maven, Jenkins, Solr, Hibernate, Spring, Hadoop to name just a few). The JVM is evolving with new languages and frameworks like Groovy, Grails, Clojure, Scala, Akka, and the Play Framework which aim to enhance developer's productivity. It is also well-known that big internet companies like Twitter have achieved significant gains in performance, scalability, and other architectural concerns by shifting a lot of back-end code from Ruby on Rails to the JVM. There are a number of architectural patterns and software development practices that have been adopted over the years in successfully building quality Java applications. These include:

Design patterns such as the Gang of Four (GoF), Dependency Injection, Model View Controller (MVC), Enterprise Integration Patterns (EIP), Domain Driven Design (DDD), and modularity patterns like those based on OSGi.
Test-Driven Development (TDD) using tools like JUnit, TestNG, Mockito (mocking), Cucumber-JVM (for behavior-driven development or BDD), and Selenium (for automated end-to-end testing).
Build tools like Maven and Gradle.
Static analysis with tools like FindBugs, Checkstyle, PMD, and Sonar.
Continuous integration and delivery with tools like Jenkins.
Performance testing with JMeter.
Web application vulnerability testing with Burp.

As we move to a rich client application paradigm based on Javascript and the Single Page Application (SPA) architecture, it is clear that Javascript can no longer be considered a toy language for front-end developers and so we need to bring the same engineering discipline to Javascript. As I said previously, the JVM remains my platform of choice for back-end development. For example, I find that AngularJS (a client-side Javascript MVC framework) works well with Spring back-end capabilities (like Spring Security and REST support in Spring MVC, HATEOAS, or Grail). However, I also keep an eye on server-side Javascript frameworks like Node.js.

The good news is that the community is coming up with patterns, tools, and practices that are helping elevate Javascript to the status of first-class language. The following is a list of patterns and tools that I find interesting and promising so far:

Javascript design patterns including the application of the GoFs to Javascript. The MVC and Dependency Injection patterns are both implemented in AngularJS, my favorite Javascript MVC framework. There are also modularity patterns like Asyncronous Module Definition (AMD) supported by RequireJS.
Functional programming support in Javascript (e.g., higher-order functions and closures) is emerging as a best practice in writing quality Javascript code.
Behavior-Driven Development (BDD) testing with Jasmine.
Static analysis with Javascript code quality tools like JSLint and JSHint.
Build with Grunt, a Javascript task runner.
Karma, a test runner for Javascript.
Protractor, an end-to-end test framework built on top of Selenium WedDriverJS.
Single Page Applications are subject to common web application vulnerabilities like Cookie Snooping, Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), and JSON Injection. Security is mainly the responsibility of the server, although client-side frameworks like AngularJS also provide some features to enhance the security of Single Page Applications.

Sunday, April 28, 2013

How I Make Technology Decisions

The open source community has responded to the increasing complexity of software systems by creating many frameworks which are supposed to facilitate the work of developing software. Software developers spend a considerable amount of time researching, learning, and integrating these frameworks to build new software products. Selecting the wrong technology can cost an organization millions of dollars. In this post, I describe my approach to selecting these frameworks. I also discuss the frameworks that have made it to my software development toolbox.

Understanding the Business

The first step is to build a strong understanding of the following:

The business goals and challenges of the organization. For example, the healthcare industry is currently shifting to a value-based payment model in an increasingly tightening regulatory environment. Healthcare organizations are looking for a computing infrastructure that support new demands such as the Accountable Care Organization (ACO) model, patient-centered outcomes, patient engagement, care coordination, quality measures, bundled payments, and Patient-Centered Medical Homes (PCMH).

The intended buyers and users of the system and their concerns. For example, what are their pain points? which devices are they using? and what are their security and privacy concerns?

The standards and regulations of the industry.

The competitive landscape in the industry. To build a system that is relevant, it is important to have some ideas about the following: what is the competition? what are the current capabilities of their systems? what is on their road map? and what are customers saying about their products. This knowledge can help shape a Blue Ocean Strategy.

Emerging trends in technologies.

This type of knowledge comes with industry experience and a habit of continuously paying attention to these issues. For example, on a daily basis, I read industry news as well as scientific and technical publications. As a member of the American Medical Informatics Association (AMIA), I receive the latest issue of the Journal of the American Medical Informatics Association (JAMIA) which allows me to access cutting-edge research in medical informatics. I speak at industry conferences when possible and this allows me not only to hone my presentation skills, but also attend all sessions for free or at a discounted price. For the latest in software development, I turn to publications like InfoQ, DZone, and TechCrunch.

To better understand the users and their needs and concerns, I perform early usability testing (using sketches, wireframes, or mockups) to test design ideas and obtain feedback before actual development starts. For generating innovative design ideas, I recommend the following book: Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions by Bruce Hanington and Bella Martin.

Architecting the Solution

Armed with a solid understanding of the business and technological landscape as well as the domain, I can start creating a solution architecture. Software development projects can be chaotic. Based on my experience working on many software development projects across industries, I found that Domain Driven Design (DDD) can help foster a disciplined approach to software development. For more on my experience with DDD, see my previous post entitled How Not to Build A Big Ball of Mud, Part 2.

Frameworks evolve over time. So, I make sure that the architecture is framework-agnostic and focused on supporting the domain. This allows me to retrofit the system in the future with new frameworks as they emerge.

Due Diligence

Software development is a rapidly evolving field. I keep my eyes on the radar and try not to drink the vendors Kool-Aid. For example, not all vendors have a good track record in supporting standards, interoperability, and cross-platform solutions.

The ThoughtWorks Technology Radar is an excellent source of information and analysis on emerging trends in software. Its contributors include software thought leaders like Martin Fowler and Rebecca Parson. I also look at surveys of the developers community to determine the popularity, community size, and usage statistics of competing frameworks and tools. Sites like InfoQ often conduct these types of surveys like the recent InfoQ survey on Top JavaScript MVC Frameworks. I also like Matt Raible's Comparing JVM Web Frameworks.

I value the opinion of recognized experts in the field of interest. I read their books, blogs, and watch their presentations. Before formulating my own position, I make sure that I read expert opinions on opposing sides of the argument. For example, in deciding on a pure Java EE vs. Spring Framework approach, I read arguments by experts on both sides (experts like Arun Gupta, Java EE Evangelist at Oracle and Adrian Colyer, CTO at SpringSource).

Finally, consider a peer review of the architecture using a methodology like the Architecture Tradeoff Analysis Method (ATAM). Simply going through the exercise of explaining the architecture to stakeholders and receiving feedback can significantly help in improving it.

Rapid Prototyping

It's generally a good idea to create a rapid prototype to quickly learn and demonstrate the capabilities and value of the framework to the business. This can also generate excitement in the development team, particularly if the framework can enhance the productivity of developers and make their life easier.

The Frameworks I've Selected

The Spring Framework

I am a big fan of the Spring Framework. I believe it is really designed to support the need of developers from a productivity standpoint. In addition to dependency injection (DI), Aspect Oriented Programming (AOP), and Spring MVC, I like the Spring Data repository abstraction for JPA, MongoDB, Neo4J, and Hadoop. Spring supports Polyglot Persistence and Big Data today. I use Spring Roo for rapid application development and this allows me to focus on modeling the domain. I use the Roo scaffolding feature to generate a lot of Spring configuration and Java code for the domain, repository (Roo supports JPA and MongDB), service, and web layers (Roo supports Spring MVC, JSF, and GWT). Spring also support for unit and integration testing with the recent release of Spring MVC Test.

I use Spring Security which allows me to use AOP and annotations to secure methods and supports advanced features like Remenber Me and regular expressions for URLs. I think that JAAS is too low-level. Spring Security allows me to meet all OWASP Top Ten requirements (see my previous post entitled Application-Level Security in Health IT Systems: A Roadmap).

Spring Social makes it easy to connect a Spring application to social network sites like Facebook, Twitter, and LinkedIn using the OAuth2 protocol. From a tooling standpoint, Spring STS supports many Spring features and I can deploy directly to Cloud Foundry from Spring STS. I look forward to evaluating Grails and the Play Framework which use convention over configuration and are built on Groovy and Scala respectively.

Thymeleaf, Twitter Boostrap, and JQuery

I use Twitter Boostrap because it is based on HTML5, CSS3, JQuery, LESS, and also supports a Responsive Web Design (RWD) approach. The size of the components library and the community is quite impressive.

Thymeleaf is an HTML5 templating engine and a replacement for traditional JSP. It is well integrated with Spring MVC and supports a clear division of labor between back-end and front-end developers. Twitter Boostrap and Thymeleaf work well together.

AngularJS

For Single Page Applications (SPA) my definitive choice is AngularJS. It provides everything I need including a clean MVC pattern implementation, directives, view routing, Deep Linking (for bookmarking), dependency injection, two-way databinding, and BDD-style unit testing with Jasmine. AngularJS has its own dedicated debugging tool called Batarang. There are also several learning resources (including books) on AngularJS.

Check this page comparing the performance of AngulaJS vs. KnockoutJS. This is a survey of the popularity of Top JavaScript MVC Frameworks.

D3.js

D3.js is my favorite for data visualization in data-intensive applications. It is based on HTML5, SVG, and Javascript. For simple charting and plotting, I use jqPlot which is based on JQuery. See my previous post entitled Visual Analytics for Clinical Decision Making.

R

I use R for statistical computing, data analysis, and predictive analytics. See my previous post entitled Statistical Computing and Data Mining with R.

Development Tools

My development tools include: Git (Distributed Version Control), Maven or Gradle (build), Jenkins (Continuous Integration), Artifactory (Repository Manager), and Sonar (source code quality management). My testing toolkit includes Mockito, DBUnit, Cucumber JVM, JMeter, and Selenium.

Sunday, March 10, 2013

How Not to Build A Big Ball of Mud, Part 2

In a previous post entitled How not to build a big ball of mud, I described the complexity of modern software systems and the challenges faced today by software developers and architects. Domain Driven Design (DDD) is a proven pattern language that can foster a disciplined approach to software development. DDD was first introduced by Eric Evans nine years ago in a seminal book entitled: Domain-Driven Design: Tackling Complexity in the Heart of Software. Over the last 9 years, a community of practice has emerged around DDD and many lessons have been learned in applying DDD to real world complex software development projects. During that time, software complexity has also increased significantly. Changes in the field of software development during the last few years include:

The proliferation of client devices which requires a Responsive Web Design (RWD) approach. RWD is made possible by open web standards like HTML5, CSS3, and Javascript which have displaced proprietary user interface technologies like Flex and Silverlight. RWD frameworks like Twitter Boostrap and Javascript Libraries like JQuery have become very popular with developers. The demands put on Javscript on the client side have created the need for Javascript MVC frameworks like AngularJS and EmberJS.

The importance of the user experience in a competitive online marketplace. Performing usability testing early in the software development life cycle (using wireframes or mockups) to test design ideas and obtain early feedback from future users is extremely valuable for creating the right solution. Metrics such as the System Usability Scale (SUS) can be used to assess the results of usability testing.

The prevalence of REST, JSON, OAuth2, and Web APIs for achieving web scale.

The emergence of Polyglot Persistence or the use of different persistence mechanisms such as relational, document, and graph databases within the same application. Developers are discovering that modeling data for NoSQL databases has many benefits, but also its own peculiarities.

The demands for quality and faster time-to-market have led to new techniques like test automation and continuous delivery.

The open source community has responded to these challenges by creating many frameworks which are supposed to facilitate the work of developing software. Software developers spend a considerable amount of time researching, learning, and integrating these various frameworks to build a system. Some of these frameworks can indeed be very helpful when used properly. However, DDD puts a big emphasis on understanding the domain. Here is what I learned from applying DDD over the last few years:

DDD is a significant intellectual investment, but with a potential for big rewards. To be successful in applying DDD, one must take the time to understand and digest the underlying principles, from the building blocks (entities, aggregates, value objects, modules, domain events, services, repositories, and factories) to the strategic aspects of applying DDD. For example, understanding the difference between an aggregate, a value object, and an entity is essential. Learning the right approach to designing aggregates is also very important as this can significantly impact transactions and performance. I highly recommend reading the recently published Implementing Domain Driven Design by Vaughn Vernon. The book provides a contemporary approach to applying DDD. For example, it covers important topics in applying DDD to modern software systems such as: sub-domains, domain events, event stores and event sourcing, rules for aggregate design, transactions, eventual consistency, REST, NoSQL, and enterprise application integration with concrete examples.

Proper application layering (user interface, application, domain, and infrastructure), understanding the responsibility of each layer (for example, an anemic domain model and a fat application layer are anti-pattern), and coding to interfaces. DDD is object-oriented (OO) design done right. The SOLID Principles of OO design are still applicable.

Determine if DDD is right for your project. Most of my work during the last few years has been in the healthcare domain. The HL7 CCDA and the Virtual Medical Record (vMR) define an information model for Electronic Healthcare Records (EHR) and Clinical Decision Support (CDS) systems respectively. Interoperability is an important and challenging issue in healthcare. DDD concepts such as "Strategic Design", "Context Map", "Bounded Context", and "Published Language" are very helpful in addressing and navigating this type of complexity.

As I mentioned earlier, DDD puts a big emphasis on understanding the domain. Developers applying DDD should be prepared to dedicate a considerable amount of time to learning about the domain, for example by collaborating and carefully listening to domain experts and by reading as much as they can about the domain. This is also the key to creating a rich domain model with behavior (as opposed to an anemic one). I found that simply reading industry standards and regulations is a great way to understand a domain. So understanding the domain is not just the responsibility of the Business Analyst. The code is the expression of the domain, so the coder needs to understand the domain in order to express it with code.

Some developers blame popular frameworks for encouraging anemic domain models. I found that a lack of understanding of the domain and its business rules is a major contributing factor to anemia in the domain model. A rule engine like Drools can help externalize these business rules in the form of declarative rules that can be maintained by domain experts through a DSL, spreadsheet, or web-based user interface.

There are opportunities in using recent ideas like Event Sourcing and the Command Query Responsibility Segregation (CQRS). These opportunities include: scalability, true audit trails, data mining, temporal queries, application integration. However, being pragmatic can help avoid unnecessary complexity.

I recommend exploring tools that are specifically designed to support a DDD or Model-Driven Development (MDD) approach. Apache Isis, Roma Meta Framework, Tynamo, and Naked Objects are examples of such tools. These tools can automatically generate all the layers of an application based on the specification of a domain model. By doing so, these tools allow you to really focus your time and attention on exploring and understanding the domain as opposed to framework and infrastructure concerns. For architects, these tools can serve as design pattern automation, constraining the development process to conform to DDD principles and patterns. I believe this is part of a larger trend in automating software development which also includes the essential practice of test automation. We software developers like to automate the job of other people. However, many tasks that we perform (including coding itself) are still very manual. Aspect-Oriented Programming (AOP) (using AspectJ for example) can also be used to enable this type of design pattern automation through compile-time weaving.

Check my previous post for 20 techniques for achieving software excellence.

Saturday, December 8, 2012

A Journey into Software Excellence

I am back in the blogosphere after a seven month hiatus. It was about time I get my blogging act together. Software development has never been so much fun. In this post, I will share some thoughts on using tools, methods, and practices that can really help in your search for software excellence from the initial prototyping of the user interface to deployment.

With the rapid proliferation of mobile and desktop devices, adopt a Responsive Wed Design (RWD) strategy to reach the largest audience possible.
Create responsive sketches, wireframes, or mockups and apply usability guidelines during the initial prototyping. The NHS Common User Interface (CUI) Program is a good example of usability guidelines for healthcare IT applications. Usability.gov also has many interesting resources as well.
Perform usability testing to test your design ideas and obtain early feedback from future users of your product before actual development starts. Use metrics such as the System Usability Scale (SUS) to assess the results.
Carefuly select the right HTML5, CSS3, and Javascript libraries and frameworks. The Single Page Application (SPA) architecture is becoming popular and can provide a more fluid user experience.
Consider "Specification By Example" and Behaviour Driven Development (BDD) tools like Cucumber-JVM to create executable user stories.
Pattern languages like Domain Driven Design (DDD) can help you avoid a "Big Ball of Mud" in architecting your software. DDD concepts such as "Strategic Design", "Bounded Context", "Published Language", and "Anti-Corruption Layer" can help you put your architecture in the right perspective, particularly if there is a need to support industry interoperability standards such as HL7 and IHE. However, beware that the practice of DDD has evolved over the last 8 years and new lessons have been learned particularly in the area of "Aggregate" design. So keep up-to-date with new developments in the field in order to leverage the experience of the community. I also found the concept of "Hexagonal Architecture" very helpful in visualizing the complexity of an architecture from different angles.
Consider a peer review of the architecture using a methodology like the Architecture Tradeoff Analysis Method (ATAM).
Embrace Polyglot Persistence (the use of different persistence mechanisms such as relational, document, and graph databases within the same application). However, use the right application development framework to make this easy. Beware of the peculiarities of modeling data for NoSQL databases and remember that "Persistence Ignorance" is not always easy to achieve in practice.
Add a social dimension to your product by integrating the user experience with existing social networking sites that your users already belong to.
Make your application more intelligent through the use of techniques such as Machine Learning (e.g., a recommendation engine), ontologies and rule engines (e.g., automated reasoning), and Natural Language Processing (NLP) (e.g., automated question answering). As Richard Hamming said: "The purpose of computing is insight, not numbers".
To enhance the user experience, adopt HTML5, SVG, and Javascript-based graphing and data visualization techniques for data-intensive applications.
Consider the benefits of deploying the application to the cloud and if you decide to deploy to the cloud, factor that into your entire design and development process including the selection of development tools. Choosing the right Platform-as-a-Service (PaaS) provider can facilitate the process.
Create a Continuous Delivery pipeline based on the core concept of automated testing. Leverage tools like Git (Distributed Version Control), Gradle (build), Jenkins (Continuous Integration), and Artifactory. Continuous Delivery allows you to go to market faster and with confidence in the quality of your product. Save infrastructure costs by using these tools in the cloud during development.
Although there is still a place for manual testing, all tests should be automated as much as possible. In addition to the traditional unit tests (using tools like JUnit, TestNG, and Mockito), embrace automated cross-device, cross-browser, and cross-platform user interface (UI) testing using a tool like Selenium.
Web services and performance testing should also become part of your build and Continuous Delivery pipeline using tools like soapUI and JMeter respectively. Performance testing should not be an afterthought.
Adopt automated code quality inspection with tools like Sonar, Checkstyle, FindBugs, and PMD. This can supplement your peer code review process and can provide you with concrete code quality metrics in addition to automatically flagging bugs (including insecure code) in your code base.
Write secure code by carefully studying the OWASP Top Ten. Adopt OWASP guidelines related to security testing and secure code reviews. Perform penetration testing to find vulnerabilities in your application before it is too late.
Do your due diligence in protecting the privacy of your users data. Put the users in control of their privacy in your system by adopting standards such as OAuth2, OpenID Connect, and the User Managed Access (UMA) protocol of the Kantara Initiative. Consider increasing the strength of authentication using multi-factor authentication (e.g., two-factor authentication using the user's phone).
Invest in learning and training your development team. Software excellence can only be achieved by skilled professionals.
Relax, have fun, and remember that software excellence is a journey.

Sunday, December 19, 2010

How Not to Build A Big Ball of Mud

If you are a software developer or architect, in addition to the ever changing business requirements, you also need to deal with the myriad of application development frameworks and design patterns out there. There are frameworks for: the User Interface (UI), Dependency Injection (DI), Aspect Oriented Programming (AOP), and Object-Relational Mapping (ORM). On top of that, you will probably need a web services framework and perhaps an Enterprise Service Bus (ESB) if you need to integrate applications. As an architect, you also need to keep an eye on scalability, availability, security, usability, industry standards, and government regulations.

In such an environment, the lack of a disciplined approach to software architecture can quickly lead to a Big Ball of Mud. In a paper presented in 1997 at the Fourth Conference on Patterns Languages of Programs, Brian Foote and Joseh Yoder describe the Big Ball of Mud:

A BIG BALL OF MUD is haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle. We’ve all seen them. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.

The Big Ball of Mud remains the most pervasive architecture today. Note that these problems can be exacerbated by an agile software development approach that leaves little or no room for design and strategic thinking (see my previous post on software architecture documentation in agile projects).

Domain Driven Design (DDD) is a set of patterns that have been introduced by Eric Evans in his book entitled: "Domain-Driven Design: Tackling Complexity in the Heart of Software". I won't go into the details of what those patterns are. I do recommend that you read the book and there are other free DDD resources on the web as well. However, I will share with you some key DDD principles that have been helpful to me in wrapping my head around software architecture complexity:

Collaboration between software developers and domain experts is important to create a common understanding of the concepts of the domain. Note that we're not talking about UI components such as screens or fields here, nor are we talking about computer science abstractions such as classes and objects. We are talking about what the domain is made of conceptually. These domain concepts are expressed in a Ubiquitous Language that is used not only in conversations and software documentation, but also in the code. So in essence, DDD is model-driven design: the model can be translated into code and frameworks like Naked Objects can help you do exactly that. Continuously refine the model.
Experimentation and rapid prototyping are an efficient way to collaborate with domain experts and business analysts. This is where the Naked Objects Framework can be very helpful. Note that you can use Naked Objects for the initial prototyping and domain modeling, and then use your preferred frameworks for the remaining layers of your application.
Pay attention to the correct implementation of DDD building blocks such as: entities (behaviour-rich with business rules), value objects, aggregates, aggregate roots, domain events, factories, repositories, and services. Avoid an anemic domain model and know how to recognize value objects and persist them properly. Value objects (such as money and time interval) are immutable and are manipulated through side-effect free functions. Move complexity and behaviour out of your entities into those value objects.
Domain concepts are grouped into modules to delineate what Eric Evans calls the "conceptual contours" of the domain. To reduce coupling, OO patterns such as the dependency inversion principle, the interface segregation principle, and the acyclic dependency principles are applied.
DDD recommends the following four layers: the presentation layer, the application layer, the domain layer, and the infrastructure layer. Although the idea of a multi-tier architecture is not new, the anti-patterns are typically: a fat application layer, an anemic domain model, and a tangled mess in general. So, properly layer your architecture:
- Repository interfaces are in the domain layer, but their implementation are in the infrastructure layer to allow "Persistence Ignorance"
- Both the interface and implementation of factories are in the domain layer
- Domain and infrastructure services are injected into entities using dependency injection (some argue that DDD is not possible without DI, AOP, and ORM)
- The application layer takes care of cross-cutting concerns such as transactions and security. It can also mediate between the presentation layer and domain layer through Data Transfer Objects (DTOs).
DDD enables Object Oriented User Interfaces (OOUI) which expose the richness of the domain layer as opposed to obscuring it.
Models exist within bounded contexts and the latter should be clearly identified. In his book, Eric Evans talks about "strategic design" and "context maps" and suggests the following options for integrating applications:
- Published language
- Open host service
- Shared kernel
- Customer/supplier
- Conformist
- Anti-corruption layer
- Separate ways.
In industries such as healthcare where an XML-based data exchange standard exists, the "Published Language" approach is the pattern typically used. Each healthcare application participating in an exchange represents a separate context. On the other hand, an "Anti-Corruption Layer" can be created as an adapter to isolate the model against an industry standard model that is not considered best practice in data modeling, is inconsistent, immature, or subject to change. However, since there is tremendous value in exchanging data, we hope not to go "Separate Ways".
DDD is a solid foundation for next-generation architecture based on the Command Query Responsibility Segregation (CQRS) pattern. The UI sends commands which are handled by command handlers. These command handlers change the state of aggregate roots. However, the aggregate roots still define behavior and business rules and are responsible for maintaining invariants. Changes to aggregate roots generate events that are stored in an event store (this is called event sourcing). Aggregate roots are persisted by storing these streams of events in the event store. That way, the aggregate roots can be reconstructed by replaying the events from the event store. The events are published to subscribers including denormalizers for enhanced query performance. The separation of the read side from the write side allows:
- Increased performance and scalability on the read and reporting side particularly when combined with a cloud deployment model
- Complete audit trails through the event store
- New data mining capabilities leveraging temporal queries
- The opportunity to eliminate the need for Object Relational Mapping (ORM) through the use of high performance NoSQL databases.