Wednesday, December 3, 2008

Keeping Data Under Control in SOA with XSLT 2.0 and XQuery

In a typical SOA project, several artifacts are represented in XML format. This includes SOAP messages, XML Schema definitions (XSDs), WSDL, WS-Policy, BPMN, BPEL, and various configuration files. The following are some examples of how XSLT 2.0 and XQuery can be leveraged in an SOA project.

Data Model and Data Format Transformation

When the services don't share the same data model (XSD) or the same data format (e.g. EDI vs. XML), there is a need to transform the data. An Enterprise Service Bus (ESB) typically provides data transformation as part of its mediation services. Some developers will find XQuery easier to use than XSLT 2.0 for transforming XML data because XQuery has a SQL-like syntax. In Oracle Aqualogic ESB, XQuery is used for data transformation while the Apache ServiceMix ESB provides support for Saxon-based XSLT 2.0 and XQuery transformations. An XSLT/XQuery engine can be deployed as Service Component Architecture (SCA) implementation type or a Java Business Integration (JBI) service engine.

One aspect of data model or data format transformation that can quickly get out of control and become difficult to manage is the mapping specification which says that a field X in message A maps to field Y in message B. In addition, the mapping specification defines business rules. This mapping specification is often produced in Excel spreadsheets format by business analysts and handed over to programmers who then code the transformation script. Now, you need to maintain and synchronize the Excel mapping document, the source XSD, the target XSD, and the XSLT 2.0 or XQuery transformation script. On top of that, if a user interface is involved, you need to ensure that it is also kept in sync with those changes.

One technique that can be useful is to use an xsd:appinfo element to capture and keep metadata close to the XSD declarations:


  • Data mapping specifications
  • Business rules using inline ISO Schematron rules for example
  • Labels, alerts, and appearances of UI components such as XForms controls.


This allows you to use XSLT 2.0 or XQuery to automatically generate data mapping reports in Excel or even generate UI components by transforming the XSD into XForms controls.


Managing Artifacts and Promoting Reuse with an SOA Repository


One of the key aspects of design-time SOA Governance is the management of the lifecycle of service artifacts and the dependencies between them. This is accomplished through a new breed of tools called SOA Repositories. Some of these repositories are being build on top of a JCR compliant repository such as Apache Jackrabbit. JCR supports querying compliant repositories using XPath 2.0. Suppose that I need to add a new element to my XSD. To ensure that I am reusing existing schema constructs, I first query the SOA Repository with XPath 2.0 to find all schema components (types, elements, and attributes) that contain a certain keyword inside their xsd:documentation element. XPath 2.0 can also help in detecting dependencies between artifacts (e.g. WSDL and XSD definitions) for change impact analysis. Open source SOA repositories such as Mule Galaxy, JBoss DNA, and WSO2 Repository have adopted this approach.

Functional Testing of Web Services

Automated testing is a key principle in agile software development. SoapUI is an open source web services functional testing framework that allows testers to not only perform XSD validation of SOAP messages, but also allows them to specify assertions on the structure and content of those messages using XPath 2.0 and XQuery. SoapUI can be easily integrated into a continuous integration process.

Data Integration

XQuery can alleviate performance and scalability issues related to the marshalling/unmarshalling of Java objects to/from XML (databinding) and object-relational mapping (ORM) for persisting XML data in relational databases. XQuery is a natural solution for querying and aggregating data coming from heterogeneous sources such as relational databases, native XML databases, LDAP, file systems, and legacy data formats such as EDI and CSV.

One promising specification in the data integration space is the W3C XQuery Scripting Extension (XQSE). By extending XQuery with imperative features such as state management, XSQE (pronounced "excuse") provide developers with additional XML processing power without the need to embed XQuery in a host language such as Java. At the time of this writing, XQSE is still a W3C working draft but is already supported by the Oracle AquaLogic Data Services Platform (ALDSP 3.0).

No comments: