by Billie Peterson, Baylor University 
 
Dear Tech Talk--  

In a previous column you discussed Cascading Style Sheets and their impact on web pages.  I've recently heard about a new mark-up language, XML. Will I have to completely redesign my library's instructional web pages so they work with this new standard?  

--Xpecting an Xplanation about XML
  
  

Dear XXX--  

In the beginning (1986), there was SGML (Standard Generalized Markup Language, ISO 8879), an international standard for defining descriptions of the structure and content of different types of electronic documents.  SGML is the "mother tongue" used for describing thousands of different document types from transcriptions of ancient languages to technical documentation of sophisticated machines.  

HTML is only one of these SGML document types.  It defines a single, fixed type of document that lets you describe a simple, office-style report (headings, paragraphs, lists, illustrations, etc.), with some provision for hypertext and multimedia.  HTML is relatively easy to learn, but as was mentioned in the December 1997 Tech Talk column on Cascading Style Sheets, HTML is rife with limitations.  (See Mace, "Weaving a Better Web," for additional details.)  To reduce these limitations, HTML needs to be extended, and there are only 2 ways to "extend" HTML:  

  1. The World Wide Web Consortium could approve a new HTML standard, a very slow and cumbersome process.  

  2. Browser developers could implement new features that are not part of the HTML standard and, therefore, are not uniformly supported by other browsers, causing incompatibility and design problems.  
  
  

Because SGML is completely extensible, it could be used to overcome all of the limitations associated with HTML, but SGML is very complex and difficult for a lay person to use.  Hence, the development of the eXtensible Markup Language, XML, a bridge between the rich complexity of SGML and the restrictive simplicity of HTML.  Whereas HTML describes how information is presented (to a certain extent), XML describes the content and the hierarchy of the information that is presented.  XML makes use of the Document Type Definition (DTD) to define a page's elements and its attributes as well as the relationships among the elements and attributes; the eXtensible Style Language (XSL), style sheets for XML documents; and the eXtensible Link Language (XLL) to increase the power of links in web pages.  
  

The beauty of XML is that new tags and hierarchies can be developed by any web page author, without waiting for a new standard; and, as long as the document is "well-formed," any XML (or SGML) application will be capable of interpreting the information.  What's the catch?  As of this writing, no HTML browser is optimized for XML.  Internet Explorer currently offers limited support for XML, and Netscape promises that the next major upgrade (5.0) of its Navigator software will be XML compliant.  

In many ways, XML documents appear similar to HTML documents, except for the provision of non-standard HTML tags.  So, does that mean that HTML pages are automatically XML compliant?  If the HTML document is "well-formed," then it is XML compliant; otherwise, it is not.  But what makes a document "well-formed?"  

 1. All tags must be properly nested and must match, and there must be an enclosing element for the whole document.  

 2. All attribute values must be enclosed in quotes, for example, <font size="5" color="blue"> is correct but <font size=5 color=blue> is incorrect.  

 3. All elements with empty content must end with "/>" instead of ">".      For example, the HTML tags <br>, <hr>, and <img> would have to be changed to <br/>, <hr/>, and <img src="picture.gif"/>.  This is required in XML because the "parser" needs to know that the <br> tag is empty so it won't look for a matching </br> tag later in the document.  

If all web page developers precisely followed the HTML standards, their web pages would be XML compliant.  However, web browsers are purposely forgiving of "incorrectly" written HTML code, so there are millions of web pages that work with current browsers but are not XML compliant.  With some time and patience, any HTML page can be converted to XML, but it's probably not necessary because XML and HTML are meant to complement, not compete with, each other.  For details on converting HTML documents to "well-formed" XML documents, see the XML FAQ (Frequently Asked ...).  

According to Jon Bosak, chair of the XML Working Group which developed XML, the best applications for XML will be those that can't be accomplished with the current HTML limitations:  
  

  1. Applications that require the Web client to mediate between 2 or more heterogeneous databases.  

  2. Applications that attempt to distribute a significant proportion of  the processing load from the Web server to the Web client.  

  3. Application that require the Web client to present different views of  the same data to different users.  

  4. Applications in which Web intelligent agents attempt to tailor information discovery to the needs of indiviual users. (Bosak "XML . . .")

Another real advantage to XML is the power of the eXtensible Link Language.  According to Neil Randall, with XLL, web authors "can provide a link that will take users to a particular resource" just as HTML currently does;  but in XML, "a cross-reference link will then show all the links that lead to that resource, and the user can follow these links to their sources."  In addition, "XML authors can . . . specify what happens when a link is not found," with possibilities including following the link without further action on the user's part or perhaps even embedding the linked document within the original (319).  

Where does XML fit in with library web pages?  To a certain extent, it's too early to say.  However, a couple of possibilities come to mind:  

  1. Designing library or library instruction web pages that change, based on the user's sophistication or physical capabilities.  

  2. Instructing users to use Internet search agents or databases that employ the XML standard.  Use of tags (fields) that describe the content of specific elements of the database should result in the retrieval of more relevant information, just as it does in standard library databases.  Given the three examples of XML documents in the sidebar, imagine the difference in search results for the topic "chip" if your search agent could look for "chip" as part of the <computer> tag or the <processor> tag.  

How will libraries make use of XML -- only time will tell as browsers become XML compliant and XML development tools evolve.  

Examples  

For more information:  
  

Beale, Stephen.  "XML Ascends on the Web:  New Web Authoring Standard Offers Advantages over HTML."  Macworld 15 (February 1998):28-29.  

Bosak, Jon.  "XML, Java, and the Future of the Web". 10 March 1997.  <http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm> (19 April 1998).  

Connolly, Dan.  XML Principles, Tools, and Techniques.  Sebastopol, CA: O'Reilly & Associates, 1997.  

Extensible Mark-up Language.  22 April 1998. <http://www.w3.org/XML>  (23 April 1998).  

Extensible Mark-up Language (XML) 1.0.  10 Feb. 1998.  <http://www.w3.org/TR/1998/REC-xml> (19 April 1998).  

Frequently Asked Questions About the Extensible Markup Language:  The  XML FAQ.  3 Feb. 1998.  <http://www.ucc.ie/xml/> (19 April 1998).  

Gee, William and John Gartner.  "Xpand Your Site With XML".  TechTools 25 March 1998.  
<http://www.techweb.com/tools/proddesign/9803/980325xml.html>  (19 April 1998).  

Light, Richard.  Presenting XML.  Indianapolis, IN: Sams.net Publisher, 1997.  

Mace, Scott, et. al.  "Weaving a Better Web".  BYTE (March 1998):58-68.  <http://www.byte.com/art/9803/sec5/sec5.htm> (19 April 1998).  

Randall, Neil.  "XML: A Second Chance for Web Markup."  PC Magazine  16 (November 4, 1997):319-320.  

 XML.com.  11 April 1998.  <http://xml.com> (19 April 1998).  



As always, send questions and comments to: 
 
Snail Mail: 
 
 
 
 
 
Tech Talk 
Billie Peterson 
Moody Memorial Library 
P. O. Box 97148 
Waco, TX  76798-7148 
 
E-Mail:   petersonb@baylor.edu 
 



LIRT News, June 1998. Volume 20, number 4.
To report problems, please contact the LIRT News Production editor at edwards@ufl.edu

<A HREF="june98.html" target="_top"> <IMG SRC="../g/ap20.gif" NOSAVE HEIGHT=20 WIDTH=20> </A><FONT FACE="Arial,Helvetica"> <A HREF="june98.html" target="_top"> WELCOME</A>&nbsp;<A HREF="../lirt.html" target="_top"><IMG SRC="../g/alpha_bo.gif" NOSAVE HEIGHT=20 WIDTH=20></A> <A HREF="../lirt.html" target="_top"> BACK ISSUES