XML is in your future

By Gary A. Mintchell, CONTROL ENGINEERING November 1, 1999

C hances are if you use a personal computer and access data using web technologies, then XML is in your future. XML is the three-letter acronym for eXtensible Markup Language. It is a markup meta-language and syntax used to create declarative languages. Its standard is owned by the World Wide Web Consortium (W3C) and can be found on the web at www.w3.org .

XML is based on SGML, as is HTML, the commonly used page description language. Rather than describe a page, XML describes data and information objects. Some of the benefits described by W3C include:

  • Enable internationalized media-independent electronic publishing;

  • Allow industries to define platform-independent protocols for data exchange, especially electronic commerce data;

  • Deliver information to user agents in a form allowing automatic processing after receipt;

  • Make it easy for people to process data using inexpensive software; and

  • Provide metadata-data about information-that will help people find information and help information producers and consumers find each other.

There are two ‘flavors’ of XML coding- well-formed and valid . In well-formed elements, the start and end tags match, empty elements have a special form, there are no overlapping elements, and attributes are quoted. Valid XML is well-formed plus it adheres to a structure defined by a Document Type Definition (DTD) or Schema.

This simple example illustrates an XML document. The company sells products online. Marketing descriptions are written in HTML, but names and addresses of customers, and also prices and discounts are formatted with XML. Here is the information describing a customer:

Acme Pharmaceuticals Co.

7301 Smokey Boulevard
Smallville
Indiana
l94571

The XML syntax uses matching start and end tags, such as and , to mark up information. A piece of information marked by the presence of tags is called an element. Elements may be further enriched by attaching name-value pairs (for example, country=’US’ in the example above) called attributes.

Vocabularies define elements. They determine element names, define attributes, and can be formal or informal. Formal specifications include DTD or Schema. The DTD is a defined standard under the W3C, while standards for Schema are under development. It is likely that there will be defined schemata for each industry, e.g., process and data acquisition industry will have defined data types and structures for important data that can be shared across application boundaries.

All data within a DTD is a string. A DTD is a good way to control the creation of data by allowing the programmer to:

  • Define a specific set of tags with specific relationships to one another;

  • Define default values for attributes;

  • Define additional text and binary entities, along with their associated notations; and

  • Indicate the starting (root) element.

For some uses, applications may need definitions of markup constructs more informative, or constraints on document structure tighter than, looser than, or simply different from those that can be expressed using defined DTDs. There is also a widespread desire to allow markup constructs and constraints to be specified in an XML-based syntax, in order to allow tools for XML documents to be used on the specifications.

XML Schema Working Group is addressing schema definition for structure, data typing, and conformance. Among the goals for schema structure are definition of incomplete constraints on the content of element type, integration of structural schemata with primitive data types, and inheritance, that is, mechanisms to make kind-of relations explicit rather than just part-of relations as is currently done. Primitive data typing includes definitions for integers, dates, and byte sequences based on experience with SQL and Java primitives. Methods of conformance checking are also under investigation by the committee.

The idea is that XML schema language can be used to define, describe, and catalog XML vocabularies for classes of XML documents. An example is use in supervisory control and data acquisition. Management and use of network devices involves exchange of data and control messages. schemata can be used by a server to ensure outgoing message validity, or by the client to allow it to determine what part of a message it understands. In multivendor environment, discriminates data governed by different schemata (industry-standard, vendor-specific) and know when it is safe to ignore information not understood and when an error should be raised instead; provide transparency control.

As Web-enabled devices and browser-based interfaces become increasingly popular, standard communication structures become more important. COM, DCOM, and OPC are already widely used for multi-vendor application communications. Look for XML to become another important data communications tool.

Comments?
E-mail gmintchell@cahners.com