Friday, July 4, 2008

XML Schema Design Strategies for SOA Projects

The schema modeling and design effort should be an integrated part of an agile approach which implements practices such as user stories, acceptance tests, unit test first, refactoring, short iterations, common code base, and continuous integration.

In the recommended contract-first approach to web services development, the XML Schema and WSDL artifacts are the foundation of an SOA project. For example, with Apache CXF, you can use the WSDL2Java tool to automatically generate a JAX-WS annotated service and server stub from your WSDL.

The first step is to adopt schema naming and design rules (NDR). Industry and government standard bodies like the Universal Business Language (UBL) and the National Information Exchange Model (NIEM) have published such NDRs.

If you’re building a new schema from scratch, then the schema should be designed in an iterative and collaborative manner. During each iteration, add just enough components to your schema to support the specific user stories that are being implemented. As the schema grows, refactor as required.

One option is to start the modeling effort with a domain model in the form of UML class diagrams to facilitate collaboration between non-technical subject matter experts (SMEs), the modeler, and the technical team. An XML schema can then be generated automatically from the UML class diagram with a tool such as Hypermodel. David Carlson, creator of Hypermodel, proposed a UML Profile for XSDs which defined a number of stereotypes that can be added to UML class diagrams to refine the mapping from class diagrams to XML schemas. Alternatively, you could export the UML model to the XML Metadata Interchange (XMI) format and use an XSLT transform to map the XMI into an XML Schema and even an XML instance. This Model Driven Development (MDD) approach to XSD provides agility in the face of constant changes in business requirements.

If you are reusing an industry or government schema such as UBL or NIEM, it is very important to use the right methodology for extending the schema as recommended by the applicable NDR or Information Exchange Package Documentation (IEPD) process in the case of NIEM. Extensions to the standard schema should be clearly defined in a new custom namespace and documented properly. The following are some strategies for extending an XML schema:

  • Wildcards xs:any and xs:anyAttribute
  • Element substitution and abstract elements
  • Type substitution via xsi:type and abstract types
  • Concrete Extension (creating a new type by extending an existing type to include additional local elements).

The schema should be tested for quality against the NDRs. The US National Institute of Standards and Technology (NIST) has published an XML Schema Quality of Design Tool (or QoD Tool) which combines Schematron and JESS rules (a Java-based open source rule engine) to validate schemas against NDRs.

For unit testing, the XMLUnit framework can be helpful in testing the schema as you refactor and implement new user stories. XMLUnit for Java allows you to make assertions about the validity of an XML document against an XML Schema. The execution of these tests should be part of your build and continuous integration process.

The automatic generation of the XSD code from the UML model should also be part of the build and continuous integration process.

Business rules or modeling requirements that are beyond the capabilities of XML Schema 1.0 should be implemented with an assertion-based language such as ISO Schematron or with the new assertion and conditional type assignment (co-constraints) capabilities in XML Schema 1.1.

Although ISO Schematron and XML Schema 1.1 can be quite powerful when used with XPath 2.0, some complex business rules will be easier to handle with rule engines such as JBoss Drools and JESS. The rule engine can be deployed as a dedicated and reusable utility web service to validate messages.

When modeling XML schemas for an SOA project, careful consideration should be given to the issue of data transformations. When the services don’t share the same data model and XML Schema, there is a need to transform the XML data using technologies such as XQuery and XSLT. This can introduce additional design complexity and runtime performance issues. Data transformations shall be avoided unless absolutely required.

The XML Schema's xsd:appinfo element can be used to capture and keep metadata close to the XSD declarations:

  • Metadata such as data transformation specifications
  • Business rules using inline ISO Schematron rules
  • Labels, alerts, and appearances of UI components such as XForms controls. This provides the opportunity to auto-generate UI components from your XSD using a transformation language like XSLT or XQuery. Keeping UI and XSD components in sync can be a challenge in SOA projects.

Finally, to maximize reuse, an enterprise-wide SOA Repository/Registry should be used to publish, centralize, and discover schema components (see my previous post on SOA Governance tools).

1 comment:

Dan Diephouse said...

Hi Joel, You lay out some good guidelines here for building XML schemas. We should get these built into Galaxy :-).