Lecture 23

XSLT—Extensible Stylesheet Language Transformations

Technical documentation: http://www.w3.org/TR/xslt

A bit of jargon:

Extensible Stylesheet Language: a family of languages built around XML, which deal with access, transformation, and presentation.
a little language for describing sets of nodes in an XML document.
Extensible Stylesheet Language Transformations, transform XML documents.
Formatting Objects, an XML-based formatting system.

XSLT original was envisioned as a part of the XSL system, but it has a life of its own.

The basic idea—an XSLT transformation performs a two step process:

  1. It transforms one (or more) XML trees into an XML tree.
  2. It outputs the result tree in one of a number of formats (XML, text, etc).

The transformation is described in terms of a number of templates. Templates can be either named or matched (i.e., associated with nodes via XPath expressions). Matched templates may have modes (these are just strings), which gives the effect of being able to match multiple, distinctly named templates to the same node of the source tree.

If two templates with the same mode match the same template, then one of them is selected according to a set of precedence rules, the general thrust of which is that more specific match patterns have higher precedence than less specific match rules.

The simplest stylesheet (what we call an XSLT program) looks like this:

<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> </xsl:stylesheet>

Let's do a brief digression here on the topic of namespaces. Namespaces provide a mechanism for distinguishing one XML vocabulary (an application, in XML-jargon) from another. In particular, vocabularies can (and often do) overlap. Namespaces were not a part of the original XML specification, but were added later in a way that was intended to be backwards compatible. Distinct namespaces are represented by URIs (Uniform Resource Identifiers), which is just a fancy name for a string. Often (as is the case here), the URI is a URL, pointing to a web page that documents the vocabulary. Note that both tags and attribute keys have namespace associations, but associating attribute keys with a non-default namespace is not common.

There are two distinct mechanism used in XML to associate namespaces with tags. The first is a simple pseudo-attribute xmlns=URI, which puts the associated tag and all enclosed tags (but not attribute keys) into the indicated namespace. The second form is a pseudo-attribute of the form xmlns:prefix=URI, which associates the given URI with the given prefix, as above. The scope of this definition includes the element in which the pseudo-attribute occurs, and all enclosing elements. This namespace then associated with qualified names that take the form prefix:name. Note that the name component of a qualified name is sometimes called its local part, and that a name could be either a tag or an attribute key.

The important thing to understand here is that qualified names and prefixes exist to support the namespace association, they are not the association itself. Thus tags that have different qualified names can be the same, e.g., <bar:foo xmlns:bar="bar"> is the same as <baz:foo xmlns:baz="bar">, and the same as <foo xmlns="bar">, because the effect in all three cases is to put the tag <foo> into the "bar" namespace, whereas tags that have the same qualified name might be in distinct namespaces, e.g., <bar:foo xmlns:bar="bar"> vs. <bar:foo xmlns:bar="baz">.

There are default rules that make this stylesheet useful—the defaults are that text nodes get printed, and the text and element children of an element are visited in order. The effect is to strip all markup. Thus, if we had a simple XML file like this:

<?xml version="1.0" encoding="utf-8"?> <root> <foo> <bar>bar #1</bar> </foo> <bar>bar #2</bar> </root>

processing would look like this:

<?xml version="1.0"?> bar #1 bar #2

If we wanted text output, we could specify it by adding a top-level instruction. We'll clean things up a bit more, and illustrate some new features:

<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template name="newline"> <xsl:text>&#0010;</xsl:text> </xsl:template> <xsl:template match="/"> <xsl:apply-templates select="//bar"/> </xsl:template> <xsl:template match="bar"> <xsl:text>Element: </xsl:text> <xsl:value-of select="."/> <xsl:call-template name="newline"/> </xsl:template> </xsl:stylesheet>

Now we're specifying a “text” method. We've also added some template rules. Reduction of the import source starts with matching the root. The template rule for the root tells us to apply templates to the bar elements of the document, in document order. The rule for the bar elements have template rules that just print out the text "Element: ", followed by the element's value, followed by a newline. Note the definition and use of the named "newline" template.

$ xsltproc simpletest.xslt test.xml Element: bar #1 Element: bar #2 $

For a much more complicated example, consider the following transformation of our XML course list file into an XHTML document. First, we have the XML:

<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="courses.xslt"?> <course-schedule> <faculty> <name>Babai</name> <!-- more names --> </faculty> <quarter term="Autumn" year="2013"> <course> <number>CMSC 15300</number> <title>Foundations of Software</title> <instructor>Kurtz</instructor> </course> <!-- more courses --> </quarter> <!-- more quarters --> </course-schedule>

One thing to note is the xml-stylesheet processing instruction. Standards compilant web browsers will use the linked stylesheet to transform the input xml document into HTML or XHTML, which will then be displayed.

Here is our XSLT, in pieces

<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="html" doctype-public="-//W3C//DTD XHTML 1.1//EN" doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" /> <!-- top level --> <xsl:template match="/"> <html xml:lang="en"> <head> <title>UCCS Theory Teaching Schedule</title> <link rel="stylesheet" href="courses.css" type="text/css" media="all"/> <meta http-equiv="content-type" content="text/html; charset=utf-8"/> </head> <body> <h1>UCCS Theory Teaching Schedule</h1> <xsl:call-template name="by-quarter"/> <xsl:call-template name="by-instructor"/> <xsl:call-template name="by-course"/> </body> </html> </xsl:template> <!-- other sections --> </xsl:stylesheet>

There's actually a fair bit going on in this fragment. Note the use of the doctype- attributes, which contain the XHTML 1.1 private and public identifiers. Note also the xmlns=".." attribute on the root element—this ensures that any unprefixed elements in the document will belong to the XHTML 1.1 namespace.

The template for the root is little more than an XHTML web page, with CSS used for styling, and three internal named template calls. The first template produces XHTML that is organized pretty much the way the XML file is organized.

<!-- by-quarter --> <xsl:template name="by-quarter"> <h2>By Quarter</h2> <xsl:for-each select="//quarter"> <h3><xsl:value-of select="@term"/> Quarter, <xsl:value-of select="@year"/></h3> <table summary="{@term} Quarter, {@year} schedule" class="by-quarter"> <xsl:for-each select="course"> <tr><td class="number"><xsl:value-of select="number"/></td> <td class="title"><xsl:value-of select="title"/></td> <td class="instructor"> <xsl:for-each select="instructor"> <xsl:value-of select="."/><xsl:text> </xsl:text> </xsl:for-each> </td> <td class="note"><xsl:value-of select="note"/></td> </tr> </xsl:for-each> </table> </xsl:for-each> </xsl:template>

There are a few things to note here.

There are two distinct ways to call a template—by providing a matching set of nodes, or by name. Also, it is possible to have various modes, which can be thought of a states, as an additional means of controlling what rules are used. Here we see an example of a named template, which makes a call to apply templates with a particular mode.

In the strings used to define attribute values, we can include XPath expressions in set braces {..}. Note how this is used to define the summary element of the <table>. Also, the <xsl:for-each> element expands its nested template, once per matching element. The order in which the expansion occurs can be controlled by <xsl:sort> elements, but defaults to document order (which is fine for us...).

<!-- by-instructor --> <xsl:template name="by-instructor"> <h2>By Instructor</h2> <xsl:apply-templates select="//name" mode="by-instructor"> <xsl:sort/> </xsl:apply-templates> </xsl:template> <xsl:template match="faculty/name" mode="by-instructor"> <xsl:param name="me" select="."/> <h3><xsl:value-of select="."/></h3> <table select="courses taught by {.}"> <xsl:for-each select="//course[instructor=$me]"> <tr><td class="quarter"> <xsl:value-of select="parent::quarter/@term"/> <xsl:text>, </xsl:text> <xsl:value-of select="parent::quarter/@year"/> </td> <td class="number"><xsl:value-of select="number"/></td> <td class="title"><xsl:value-of select="title"/> <xsl:if test="count(instructor)>1"><xsl:text>*</xsl:text></xsl:if> </td> <td class="note"><xsl:value-of select="note"/></td> </tr> </xsl:for-each> </table> </xsl:template>

Here, the first problem we have to solve is in selecting a set of nodes that names each instructor once. The tricky XPath expression selects the first instructor node for each instructor. The second tricky problem is how to deal with the context dependence of "." within XPath expressions. We want to select based on equality with the context node—but we can't refer to XSLT's notion of the context node using . within a more complicated XPath expression. So XSLT adds current() to the XPath functions, which does the trick.

Also note how we obtain the quarter associated with a particular class.

Finally, the listing by courses is straightforward:

<!-- by-course --> <xsl:template name="by-course"> <h2>By Course</h2> <table summary="course listings"> <xsl:apply-templates select="//course[instructor]" mode="by-course"> <xsl:sort/> </xsl:apply-templates> </table> </xsl:template> <xsl:template match="course" mode="by-course"> <tr><td class="number"><xsl:value-of select="number"/></td> <td class="quarter"> <xsl:value-of select="parent::quarter/@term"/> <xsl:text>, </xsl:text> <xsl:value-of select="parent::quarter/@year"/> </td> <td class="title"><xsl:value-of select="title"/></td> <td class="instructor"> <xsl:for-each select="instructor"> <xsl:value-of select="."/><xsl:text> </xsl:text> </xsl:for-each> </td> </tr> </xsl:template>

The various source files:

Exercise 23.1 Write an XSTL program that converts the XML Element Database (mass.xml) to XHTML so that when you view it in a browser, you see a web page that contains two tables, (1) presenting the elements by atomic number, and (2) presenting the elements sorted by abbreviation.