XML: The future of data management

XML (eXtensible Markup Language) is derived from SGML (Standard Generalized Markup Language). SGML was created as an international standard (ISO 8879) in 1986 and provides a standard format for embedding descriptive markup into documents. Descriptive markup refers to pieces of code (often called "tags") that define the presentation of a document.

By Laura Zurawski July 1, 2000

XML (eXtensible Markup Language) is derived from SGML (Standard Generalized Markup Language). SGML was created as an international standard (ISO 8879) in 1986 and provides a standard format for embedding descriptive markup into documents. Descriptive markup refers to pieces of code (often called “tags”) that define the presentation of a document.

XML takes SGML a little further by redefining some internal parameters to make the language suited for use on the World Wide Web. What makes XML different from other web languages such as HTML (Hypertext Markup Language, another derivative of SGML) is that markup tags are not pre-defined.

How XML works

HTML is fine for simply displaying documents. However, if there is a need to actually make use of the information contained in these documents, HTML can’t do much to help. This is where XML can come in handy. User-defined tags created in XML can be processed by other web applications and used in much more dynamic ways than was previously possible with HTML.

For example, consider the HTML in fig. 1:

What is returned looks like a bulleted list. This is fine if you want to simply display the information, but what if you want to use it for something else? Perhaps you want to pull the information into another web page somewhere else on the server, in a different format. Then you would need XML, which you could use to produce a file like fig. 2:

This looks different than the HTML, and it works in a different way. Tags in the HTML file describe how information contained in the tags looks. Tags in the XML file describe what the information is.

Associated with the XML file is a document type definition (DTD), which describes what to do with user-defined tags. A DTD for the above XML example might look like fig. 3:

This DTD gives the parameters of the tags. The element “mailinglist” may or may not contain an intro, and must contain a name, address, and city, and country. The other elements, defined as “#PCDATA,” are all plain text, defined by what the user types in.

The DTD can be used to extract the information in the XML file and place it somewhere else, such as a program that creates mailing labels.

Valid, well-formed

An XML program that has an associated DTD, and conforms to it, is called valid . However, one of the ways in which XML differs from SGML is that is does not necessarily need to be valid to function. An XML file can be created independently of a DTD, the caveat being that the tags defined in the file are limited only to that one file. By using a DTD, the user can create multiple files with the same tags.

Whether or not the file uses a DTD, it must be well-formed . This means that it must adhere to certain guidelines to ensure that the file is parsed correctly. Some of these guidelines include:

All elements must have a start and end tag;

All “empty” HTML tags (tags that do not require an end tag in HTML) must end with “/>;”

If no DTD is used, the file must be defined as “standalone;”

All element attributed must be in quotes; and

Tags must be nested in proper order.

What’s ahead

XML has been in existence for only a short time, but there are already initiatives underway to improve it. XML enthusiasts are developing schemas that go beyond the capabilities of regular DTDs. These schemas may eventually lead to industry standards.

One such development is Microsoft’s (Redmond, Wa.) BizTalk initiative, which aims to provide an open design framework for implementing XML schemas throughout business and industry. Sequencia (Phoenix, Ariz.) is one of the first companies to develop BizTalk schemas for the process industries. XML is the driving force behind its latest Internet endeavor, ProcessPoint.com, a business-to-business portal based on published BizTalk schemas to enable interaction between industry professionals.

For more information about XML and BizTalk, go to www.controleng.com .

Author Information

Laura Zurawski, web editor lzurawski@cahners.com

Fig. 1

&font face=”arial” size=”4″>

&b>&i>This is some information:&/i>&/b>

&ul>

&li>John Q. Public&/li>

&li>1234 Main Street&/li>

&li>Anytown&/li>

&li>USA&/li>

&/ul>

&/font>

Fig. 2

&mailinglist>

&intro>This is some information:&/intro>

&name>John Q. Public&/name>

&address>1234 Main Street&/address>

&city>Anytown&/city>

&country>USA&/country>

&/mailinglist>

Fig. 3

&!doctype mailinglist [

&!element mailinglist (intro?,

name, address, city, country)>

&!element intro (#PCDATA)>

&!element name (#PCDATA)>

&!element address (#PCDATA)>

&!element city (#PCDATA)>

&!element country (#PCDATA)>