XML and Associated Technologies30.10.99 Ander Tenno (93974A), Yaron Lev-Ran (93979F) AbstractXML is a method for putting structured data (spreadsheets, address books, configuration parameters, financial transactions, technical drawings, etc.) in a text file. XML consists of a set of rules for designing text formats for such data, in a way that produces files that are easy to generate and read (by a computer) and avoid common pitfalls, such as lack of extensibility and platform independence. In addition to XML 1.0, there is a growing set of optional modules that provide sets of tags and attributes, or guidelines for specific tasks. These technologies include MathML, XLink, XSL, Namespaces, and others. 1 IntroductionThe Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. In addition to XML 1.0, there is a large number of additional technologies that extend and expand XML into different, more specialized areas. 2 TechnologiesIn this section, we have a look at XML and some technologies associated with it, starting with XML itself and then moving on to briefly cover other technologies. 2.1 XML 1.0XML is not a single, predefined markup language: it’s a metalanguage (a language for describing other languages) which lets user design her own markup. (A predefined markup language like HTML defines a way to describe information in one specific class of documents: XML lets user define his own customized markup languages for different classes of documents.) It can do this because it’s written in SGML (Standard Generalized Markup Language), the international standard metalanguage for markup. XML is currently partially (through CSS) supported by Microsoft Internet Explorer 5. A new version of Netscape Communicator (currently being developed by mozilla.org) will also support XML. XML is designed to make it easy and straightforward to use SGML on the Web: easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web [4]. It is defined as an extremely simple dialect of SGML which is completely described in the XML Specification. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML [4]. What this means is that SGML is the 'mother tongue', used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients’ clinical records to musical notation [9]. HTML is also one of these document types. XML, on the other hand, is an abbreviated and simplified version of SGML to make it easier for the user to define her own document types, and to make it easier for programmers to write programs to handle them. It omits the more complex and less-used parts of SGML in return for the benefits of being easier to write applications for, easier to understand, and more suited to delivery and interoperability over the Web. [9] XML files are text files, but even less than HTML are they meant to be read by humans. They are text files, because that allows experts (such as programmers) to more easily debug applications, and in emergencies, they can use a simple text editor to fix a broken XML file. But the rules for XML files are much stricter than for HTML. A forgotten tag, or a an attribute without quotes makes the file unusable, while in HTML such practice is often explicitly allowed, or at least tolerated. It is written in the official XML specification: applications are not allowed to try to second-guess the creator of a broken XML file; if the file is broken, an application has to stop right there and issue an error. [2,4] Since XML is a text format, and it uses tags to delimit the data, XML files are nearly always larger than comparable binary formats. That was a conscious decision by the XML developers. The advantages of a text format are evident (see 3 above), and the disadvantages can easily be solved at a different level. Disk spaces isn't as expensive anymore as it used to be, and programs like zip and gzip can compress files very well and very fast. Those programs are available for nearly all platforms (and are usually free). In addition, communication protocols such as modem protocols and HTTP 1.1 (the core protocol of the Web) can compress data on the fly, thus saving bandwidth as effectively as a binary format. [2] Also, XML is license-free, platform-independent and well-supported [2]. 2.2 XSL and XSLTXSL (Extensible Stylesheet Language) is perhaps the most important technology associated with XML. The reason for this is that XML documents require the use of stylesheet to determine how the information contained in the XML file is displayed on the screen. The official definition of XSL follows. XSL is a language for expressing stylesheets. Given a class of structured documents or data files in XML, designers use an XSL stylesheet to express their intentions about how that structured content should be presented; that is, how the source content should be styled, laid out and paginated onto some presentation medium such as a window in a Web browser or a set of physical pages in a book, report, pamphlet, or memo [7]. XSL uses XML syntax (an XSL stylesheet is actually an XML file) but combines formatting features from both DSSSL (Document Style and Semantics Specification Language, the international standard for stylesheets for SGML documents) and CSS (Cascading Stylesheet Specification, provides a simple syntax for assigning styles to elements, currently partially implemented in newer HTML browsers) and has already attracted support from several major vendors. [9] XSLT (XSL Transformations) has also been defined. XSLT is a language for transforming XML documents into other XML documents. XSLT is designed for use as part of XSL, but it can also be used independently of XSL. [5] 2.3 MathMLMathML stands for Mathematical Markup Language. MathML is an XML application for describing mathematical notation (both its structure and content). The goal of MathML is to enable mathematics to be served, received, and processed on the Web, just as HTML has enabled this functionality for text [10]. The sophistication could vary from math expressions like through simple inline equations such as to display equations like
Figure 1. Example of possibilities with MathML. [9] While MathML is human-readable it is anticipated that, in all but the simplest cases, authors will use equation editors, conversion programs, and other specialized software tools to generate MathML [10]. Several early versions of such MathML tools already exist, and a number of others, both freely available software and commercial products, are in development. 2.4 NamespacesA single XML document may contain elements and attributes that are defined for and used by multiple software modules. Why? One motivation for this is modularity; if such elements and attributes (also called 'markup vocabulary') exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it. [3] Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name. [3] Basically what this means is when we have multiple vocabularies, for example, A and B, we might get to a situation where both vocabularies define an element or attribute by the same name, for example, WindowColor. When both A and B are active (in scope) at some point of the program, it is not possible to determine which vocabulary is referred to. Some kind of mechanism is needed to tell which WindowColor attribute we want. The XML namespaces accomplishes this by attaching a namespace prefix to an element. So, in our example, we can refer to vocabulary A by typing A:WindowColor, and similarly, vocabulary B by typing B:WindowColor. 2.5 XLink and XPointerThe linking abilities of XML systems are much more powerful than those of HTML. Existing HREF-style links will remain usable, but the new linking technology is based on the lessons learned in the development of other standards involving hypertext, such as TEI (Text Encoding Initiative, an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research) and HyTime (Hypermedia/Time-based Structuring Language), which let user manage bi-directional and multi-way links, as well as links to a span of text (within current or other documents) rather than to a single point. [9] The XML Linking Specification [8] (XLink) and XML Extended Pointer Specification [11] (XPointer) documents contain a detailed draft specification. An XML link can be either a URL or a TEI-style Extended Pointer (XPointer), or both. A URL on its own is assumed to be a resource (as with HTML); if an XPointer follows it, it is assumed to be a sub-resource of that URL; an XPointer on its own is assumed to point to a location in the current document. The difference between XPointer and HTML bookmark is that XPointer allows to specify the end of the linked text fragment, a feat not possible with HTML bookmark. [9,11] 2.6 XPathXPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax, modeling the XML document as a tree of nodes. [6] 2.7 3DXML3DXML is a way of describing websites and other structured information spaces in XML and publishing them in VRML (Virtual Reality Modeling Language). The exact implementation of 3DXML has yet to be determined. The VRML could be created on the server side by a Perl script, or on the client side using ActiveX controls or JavaBeans. 3DXML could be a standalone format or it could function as a 3D stylesheet. One of the main objectives of 3DXML is to give authors and designers an easy way to experiment with conveying information in 3D, using a simpler and more direct syntax than straight VRML. [1] It is believed and hoped by some parties that 3DXML will eventually replace HTML (3DXML is intended to be a front end to XML). Whether this happens or not, remains to be seen. |
| [1] |
Ancona, D., 3DXML, 17.08.1998 [referred 30.10.1999] |
| [2] | Bos, B., XML in 10 Points, 27.03.1999 [referred 30.10.1999] < http://www.w3.org/XML/1999/XML-in-10-points.html > |
| [3] |
Bray, T. & Hollander, D. & Layman, A., Namespaces in XML, 14.01.1999 |
| [4] |
Bray, T. & Paoli, J. & Sperberg-McQueen, C. M., Extensible Markup |
| [5] |
Clark, J. XSL Transformations (XSLT), 13.08.1999 [referred 30.10.1999] |
| [6] |
Clark, J. & DeRose, S., XML Path Language (XPath), 13.08.1999 |
| [7] |
Deach, S., Extensible Stylesheet Language (XSL) Specification, 21.04.1999 |
| [8] |
DeRose, S. & Orchard, D. & Trafford, B., XML Linking Language
(XLink), |
| [9] |
Flynn, P., The XML FAQ, 01.06.1999 [referred 30.10.1999] |
| [10] |
Ion, P. & Miner, R. & Buswell, S. & Poppelier, N.,
Mathematical Markup |
| [11] |
Maler, E. & DeRose, S., XML Pointer Language (XPointer),
03.03.1998 |
| [1] | [ Bosak, J., XML, Java and The Future of the Web, 10.03.1997 [referred 30.10.1999] < http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm > |
| [2] | Clark, J., Associating Style Sheets with XML documents, 29.06.1999 [referred 30.10.1999] < http://www.w3.org/TR/xml-stylesheet/ > |
| [3] |
Harvard Computing Group, XML - The Adoption Curve, 15.01.1999 |
| [4] |
Walsh, N., A Technical Introduction to XML, 10.09.1997 |