|
|
Papers and Presentations
by Mulberry Staff
We have given numerous talks on XML, analysis, XSLT and
XPath, and SGML at conferences, as well as to industry and user groups
in the US, Canada, Asia, Australia, and Europe. We have also published
articles in several industry publications. Some of Mulberry's recent
papers, public presentations, and publications (and a few old
favorites) are described below.
And we "walk the walk": our presentations are developed
in XML, with either or both PDF and HTML "slide" renditions. Selected
descriptions below have links to versions of the papers to download,
either a PDF file or an HTML subdirectory, compressed in .zip
format.
If you are interested in the technology we use for this,
find our application along with others in our collection of
XML-based presentation tools.
|
2008
Cool or useful
B. Tommie Usdin (Balisage 2008, Montréal)
True versus Useful, or True versus Likely-to-be-useful, are tradeoffs we find ourselves making in document modeling and many other markup-related situations all the time. But Cool versus Useful is a far more difficult tradeoff, especially since our world now includes a number of very cool techniques, tools, and specifications. Cool toys can have a lot of gravitational pull; attracting attention, users, projects, and funding. Unfortunately, there is sometimes a disconnect between the appeal of a particular tool/technology and its applicability in a particular circumstance.
A non-backwards-compatible update: a difficult decision
Deborah A. Lapeyre (International Symposium on Versioning XML Documents and Vocabularies, Montréal)
The U.S. National Library of Medicine (NLM) Journal/Book Tag Sets have been widely adopted by libraries, archives, and commercial publishers. The users are widely distributed, generally unknown to each other, and in many cases unknown to the Tag Set advisory group, owners, and secretariat. The first five revisions to the Tag Sets were backwards compatible, but the most recent is not. The decision to make a non-backwards-compatible revision was not taken lightly. It was made based on several factors, including a decision to favor the needs of future users over the convenience of current users.
Introductory Schematron Deborah A. Lapeyre and Wendell A. Piez (DC XML Users Group, January 2008, Washington, D.C.)
This tutorial discusses Schematron, a rules-based validation/reporting language that works by making assertions about patterns found in XML documents and reporting back messages about the truth (or otherwise) of those assertions. While Schematron can work with many tree-querying languages, the tutorial illustrates Schematron as it is most commonly used, with XPath, the tree-walking and expression language used with XQuery and XSLT. [Link to our Schematron page, which includes this tutorial's slides]
|
2007
|
Silverchair interviews Tommie Usdin about her experience with markup languages, the work of Mulberry Technologies, and the use of XML in publishing.
Creating XML transformations in two separate tasks, Mapping and Coding, not only maximizes the skills of various team members, but also reduces development time and cost, and increases correctness of the finished code. [Link to the presentation slides or our example mapping specification]
Introductory Schematron Deborah A. Lapeyre and Wendell A. Piez (XML 2007, Boston)
This tutorial discusses Schematron, a rules-based validation/reporting language that works by making assertions about patterns found in XML documents and reporting back messages about the truth (or otherwise) of those assertions. While Schematron can work with many tree-querying languages, the tutorial illustrates Schematron as it is most commonly used, with XPath, the tree-walking and expression language used with XQuery and XSLT. [Link to our Schematron page, which includes this tutorial's slides]
At the Text Encoding Initiative Consortium Members' Meeting (University of Maryland, College Park), B. Tommie Usdin delivers a Keynote presentation discussing the TEI's accomplishments and influence on the computing world over the last 20 years and posing questions, the answers to which will define the TEI's goals for the future. [Link to the text of the Keynote]
Tommie Usdin discusses the implications of XML's success: whether the work is over (or just starting), whether XML is (or should be) going underground, and whether the markup community has misconceptions about its role in XML's success.
LMNL (Layered Markup and Annotation Language) Wendell Piez (International Workshop on Markup of Overlapping Structures, August 2007, Montréal)
As part of a panel discussion, Wendell Piez explores the potential of LMNL as a way to handle overlapping markup.
|
2006
XSLT for Quality Checking in the Publication Workflow Wendell Piez (Mulberry's Seminar Series)
[Link to the seminar slides. Those wishing to download the sample stylesheets demonstrated at the seminar may find the link in the seminar's penultimate slide (#38).]
A brief report on some design decisions recently made by the Ad Hoc LMNL Group about the LMNL (Layered Markup and aNnotation Language) syntax and design model. A simplified version of layers is presented, along with a review of LMNL that includes previously unpublished material on non-character atoms and namespaces. (Although this paper is not represented in the conference proceedings, an author package is available as part of the proceedings.)
What Is XML and Why Should You Care?
B. Tommie Usdin and Debbie Lapeyre (XPlor Mid Atlantic, April 2006, Miami Beach)
More and more organizations are moving their content to XML. Some are asking for XML as well as
pages from their printers; some are sending XML to their printers. This presentation discusses who is moving to XML and what they hope to get from it, as well as how does XML work and how participants should approach
XML. The basic vocabulary needed to talk about XML and an overview of the
logical components of an XML application are provided. [Link to the HTML presentation copy]
EXPLOR Global (February, 2006, Miami Beach)
Tommie Usdin and Debbie Lapeyre
- Introduction to XSLT Concepts
You keep hearing that XML is exciting; that once you have your content in XML you can do anything with it; that XML is powerful and flexible. Then you look at an XML file and don't see what the fuss is all about! XSLT (the XML transformation language) is the language that makes XML powerful and flexible. Using XSLT you can: convert XML into display formats (HTML, PDF, etc.); make XML into tool-specific formats (such as typesetting languages), and automatically add: numbering, cross-references, tables of contents, and generated text to your pages. You can also use XSLT to convert documents tagged according to your DTD/schema into documents tagged according to someone else's tag set! XSLT changes the way you'll think about XML. This introduction covers the principles of XSLT, its processing model, what it can and can't do, and how it is being used in real environments. This is a concept course; showing "just enough" syntax. [Link to the HTML Slides for "Introduction to XSLT Concepts" or PDF of Handouts for "Introduction to XSLT Concepts"]
- Introduction to XSL-FO Concepts (Printing Directly from XML)
XSL-FO (Extensible Stylesheet Language - Formatting Objects) is a specification for formatting XML documents for print or web display. Publishers, catalog producers, and financial institutions (among many others) are using XSL-FO to go directly from XML into PDF, PostScript, PCL. etc. This conceptual introduction introduces XSL-FO, what it is, how it works, how it can be used, and what it is capable of producing (and what it can't!). Using a stylesheet-in-development, we illustrate the logical components of an XSL-FO formatting system, how the page geometry works, and show you the basic vocabulary of "formatting objects": blocks, wrappers, Cascading-Stylesheet-like attributes, and pages. Why isn't everyone using XSL-FO? Should your company consider it? [Link to the HTML Slides for "Introduction to XSL-FO Concepts" or PDF of Handouts for "Introduction to XSL-FO Concepts"]
- What is XML and Why Should You Care?
XML is a data format that manages text and content as named objects. XML documents with their "tags" can be part of cost-effective solutions for content reuse, repurposing, internationalization and more. This session provides the vocabulary you need to talk about XML, a look at how XML works, some real world examples, and a glimpse at the logical components of an XML application. [Link to the HTML Slides for "What is XML and Why Should You Care?" or PDF of Handouts for "What is XML and Why Should You Care?"]
- How and Why are Companies Using XML?
More and more organizations are moving their content to XML. Some are asking for XML as well as pages from their printers, some are sending XML to their printers. Who is moving to XML, and what do they hope to get from it? How can designers and printers serve their XML customers? If you understand why your customer wants XML and what they want to do with it you can help them meet their goals, and thus increase your value as a supplier! [Link to the HTML Slides for "How and Why are Companies Using XML?" or PDF of Handouts for "How and Why are Companies Using XML?"]
- Moving to XML: The Investment
XML has many benefits, but no one ever said it came for free. Moving to XML will change the way you work, the flow or content through your organization, staffing skills (and possibly staffing levels), and the opportunities you have. Where do tags come from and when? What does your staff need to know? Does added value mean added work? How can XML help in QA? What are some of the known problems and pitfalls you might avoid? [Link to the HTML Slides for "Moving to XML: The Investment" or PDF of Handouts for "Moving to XML: The Investment"]
- Why XML for Print?
Should your organization be making print publications from XML? The current XML hype is focused on web portals, XML-service-oriented architectures, and e-business applications. But while the use of XML in traditional print publishing may be less trendy and newsworthy, it is equally powerful. Working with XML can help publishers improve quality and timeliness, as well as allowing them to repurpose, reuse, and reformat content from a single source. XML allows publishers to create high-quality print publications using source data that can also support electronic publication, electronic archives, enhanced search and retrieval, and new product opportunities. [Link to the HTML Slides for "Why XML for Print?" or PDF of Handouts for "Why XML for Print?"]
- XML in Print Production
Most of the ways of adding XML to print production come down to a variation on one of three themes: making pages then XML, introducing XML during composition, and working with XML from as far in front of composition as you can manage. What are the implications and advantages of each style? Why would you prefer one to another? If you do make XML early in the production cycle, how do you get from XML to pages? There are many methods, each with its own set of pros and cons, that can be used in combination for multiple content reuse. [Link to the HTML Slides for "XML in Print Production" or PDF of Handouts for "XML in Print Production"]
Introduction to XPath 2.0
Wendell Piez and Debbie Lapeyre (DC XML Users Group, January 2006, Washington, D.C.)
An introduction to the concepts and syntax of XPath 2.0 (the new XML query and tree-traversal language -- now in Working Draft -- from W3C) and the differences between XPath 1.0 and XPath 2.0. The data model has changed; there are powerful new functions and operators; and XPath 2.0 is closer to a programming language than ever. [Link to the HTML presentation copy]
|
2005
|
A tutorial introducing the proposed new W3C XML document query and traversal
language. With this well-received tutorial, we provide some sample files
for participants to play with. [Download samples (link to a 9Kb .zip file).]
XSLT can be applied to a range of tasks besides generating final output
formats, including the automation and semi-automation of editorial and
copy-editing chores, extra-schema validation, data aggregation, filtering,
indexing, file management and more.
XML DTDs and schemas are used to specify what tagging is allowed in a set of XML documents. Originally XML had only one way to express these rules; now there are many, each of which reflecting not only different conceptions of the functional requirements for constraint languages, but also different approaches to meeting those requirements. This talk provides a clear look at the nature and strengths of each of the major schema languages (XML DTD, W3C XML Schema (XSD), RELAX NG, and Schematron), without hype and without advocating any of them. After discussing the uses of XML schemas in general, the each language is examined highlighting its major features; what sorts of constraints (rules) it can, and cannot express; and the environments in which it is most popular. The talk ends with factors in selecting appropriate schema language(s) and a discussion of ways in which many organizations are using multiple schema languages in the same projects to do different tasks. [Download .zip of HTML.]
In Praise of the Edge Case
B. Tommie Usdin (Extreme Markup Languages® 2005, Montréal)
The Extreme Markup Languages® conference, the organizers are sometimes told, devotes too much time to edge cases. This complaint inspires reflection on the value of exploring, learning about, and learning from the technological edge. Remember: today’s main stream application was yesterday’s edge case.
An examination of a practical and theoretical question in markup language
design, using as a counter-example the unorthodox "Web Graphic Layout
Language" project of the author.
Visualizing TEI using SVG Wendell Piez (ALLC/ACH 2005, Victoria, British Columbia)
A poster; also presented at the 2005 TEI Members' Meeting, Sophia Bulgaria, and at XML 2005, Atlanta, Georgia.
|
2004
|
Microsoft PowerPoint is ubiquitous, and therefore controversial. Most critiques, both of the software and of its widespread adoption in educational settings, express concerns that are not particular to PowerPoint alone, but apply to "slideware" presentations generally. The reliance on sequences and hierarchies of bullet points (a poor means of presenting some kinds of complex information), the foregrounding of visual gimmicks over content, the displacement of attention from the speaker and her message onto summary arguments presented dumbly on screen: far from being necessary features of presentation technology, these (according to the critics) prove to be shortcomings that interfere with, rather than enhance, a presenter's ability to communicate.
This paper presents an alternative to slideware, in the form of SVG graphics used for presentation. Why SVG? It meets all our functional requirements of a presentations technology; but even more importantly, as an XML-based format, Scalable Vector Graphics is well-suited to an XML-based production framework. Going far beyond sequences of bullet points, SVG supports open-ended, innovative uses of visual media in presentation. This becomes practical because the complexities of SVG coding can be relegated to a processing layer, following the classic design pattern of XML publishing. [This paper won the Best Speakers Award at XML 2004. An early version
was presented at ALLC/ACH 2004 (Gothenburg, Sweden).]
Half-steps toward LMNL Wendell Piez (Extreme Markup Languages®, Montréal)
Overlap in markup occurs where some markup structures do not nest, such as where the sentence and phrase boundaries of a poem and the metrical line structure describe different hierarchies. LMNL (Layered Markup and Annotation Language) is a model for representing textual data, designed to recognize and account for layer separation and markup overlap. LMNL is specified as a data model, not as a syntax -- but without a syntax and an API it's very difficult to experiment with the model. The author demonstrates a subset of LMNL using an XML syntax and some severe restrictions on LMNL (thus "half-LMNL").
The TEI has grown and matured greatly in recent years, both in the number and breadth of its applications, and in their sophistication. It can be taken as a sign of the success and state of health of TEI to see persistent efforts to push its boundaries. The author discusses one area that is repeatedly cited as where the TEI "should" provide a competitive alternative, but apparently does not: the realm of authoring or original composition by scholars and writers.
|
2003 - 2002 - 2001
|
In March 2003, the National Library of Medicine (NLM) released into the public domain a suite of DTD modules for describing journal literature, books, and many kinds of textual material. The full suite was developed by the National Center for Biotechnology Information (NCBI) and the XML consulting firms Inera, Inc. (funded by the Andrew W. Mellon Foundation) and Mulberry Technologies, Inc. (funded by NCBI). Also in March the first two public DTDs developed from this suite were released: the Journal Archiving and Interchange DTD, and the Journal Publishing DTD which defines a common format for the creation of journal content in XML. This presentation discusses use of the DTDs so far, future plans, and the work of the advisory board.
In this technical exhibition of XSL-FO tools, each product representative provided a sample and rendered the samples provided by the other participants. As far as we know, this was the first public demonstration of interchange of typesetting files. The participants received only XSL-FO instances, without any guidance on what the formatted document should look like, and each formatted as many of the samples as they could. None of the tools succeeded with all of the samples, and some of them required manipulation of the documents before they could be rendered at all. At the end of a very exciting demonstration of XSL-FO rendering tools, the conclusions were: XSL-FO rendering is practical for many applications; there are a variety of high-quality XSL-FO tools available; each tool has strengths and weaknesses; and none is clearly superior to all others for all uses.
Editorial work will always require the judgement of informed and sensitive human beings. Nonetheless, XML-based applications, even at a small scale, can support and complement, rather than detract from, the work of human beings in providing the kind of care and attention to information through the publishing process that is, ultimately, the only thing that can assure the quality of published works. This paper examines, in concrete detail (using the XML behind an XML 2003 conference paper as an example test bed), how one particular XML technology, XSLT, can be brought to bear in such applications.
Few classes of narrative document can be as tightly
specified as most business documents can. But many can usefully be
specified more tightly than they are. This talk illustrates the costs
of underspecifying content models. It is important to recognize the
difference between between "it doesn't matter; there is no information
here" and "it can't be specified because the content creator will
supply it".
A schema’s role is to mediate and adjudicate between human and machine semantics; recognizing this can help us manage our schemas better. Some practitioners work solely with an operational semantics, according to which the meaning of a tag is what we want it to cause the processing software to do with the data. A better understanding is reached if we adopt the structuralist view that a sign is the (arbitrary) relation between a signifier and a signified. In metalanguages (including schema languages) the signified is itself a sign; in some languages the signifier may likewise be a sign. Proper understanding of the relationship among sign, signifier, signified, metalanguage, and connotative system will allow us to layer our systems more effectively and to obtain useful results even in fluid systems where our understanding of the underlying reality cannot, or should not, be fixed.
Representing multiple hierarchies within a single document has always been a problem for XML. To try to address the problems of representing multiple hierarchies and of annotating existing tree structures with type information (as in the PSVI), we have developed a layered data model based on the Core Range Algebra presented at Extreme 2002 by Gavin Nicol. This data model views documents as strings over which span a number of named ranges, each of which can themselves have associated metaranges with their own internal structure. To aid experimentation with this data model, we developed a markup notation to reflect it, the Layered Markup and Annotation Language (LMNL), and have constructed several prototype applications to facilitate the extraction of single views, as XML structures, from LMNL documents. (Although this paper is not represented in the conference proceedings, an author package is available as part of the proceedings.)
XML and Print
Debbie Lapeyre (Seybold New York, 2002; other locations previous years)
This tutorial explores the relevance of XML as a data
format for creating high-quality print publications that can later
support electronic publication, electronic archives, and enhanced
search and retrieval. XML's ability to assist management of
multi-author publications, revisions, and approvals; and its potential
for fast reuse and repurposing of content are highlighted. [Download .zip of 2001 version, in PDF.]
XML for Publishing Managers
Debbie Lapeyre (Seybold New York, 2002; other locations previous years)
A 3-hour tutorial that starts by defining XML and
goes on to explain the benefits of XML applications, the use of XML in
multimedia publishing, application integration, information
repositories, and database publication. The impact of XML on workflow
and staffing is also discussed, as well as the staff skills needed for
XML-based data distribution. [Download .zip of August 2001 version, in PDF.]
Document Analysis for DTD or Schema Development
Debbie Lapeyre and Tonya Gaylord (XML 2001, Orlando)
A tutorial on the principles of information analysis.
An interactive sample document analysis is used to demonstrate basic
concepts of structured markup, the distinction between "useful" versus
"possible" information, and the relationships between information
components. [Download .zip of
PDF.]
From HTML to XML
Wendell Piez (XML 2001, Orlando)
Migrating data from a web format (HTML) into a more
versatile and manageable XML format involves a range of decisions based
on what shape the source code is in, what kinds of functions and
operations the new XML-encoded data needs to be able to support, and
design tradeoffs between the power and versatility of markup on the one
hand, and the expense of tagging and maintenance of strong data on the
other. [This is a short version of a longer presentation, co-written with
Tommie Usdin (see "Converting HTML to XML" below). [Download .zip of HTML.]
A paper considering markup design strategies from a
theoretical point of view. Sometimes "semantic opacity" is a feature,
not a bug. Because they sometimes work to mask even while they
communicate, markup languages can be usefully considered as a species
of rhetoric.
Introduction to Information (Document) Analysis
Tommie Usdin (XML One, XML Europe)
A one-day tutorial on the principles of information
analysis. Following the morning tutorial on the principles of analysis
(such as element granularity, use of attributes, and the distinction
between "useful" versus "possible" elements), an interactive sample
analysis is used to demonstrate basic concepts of identifying
information components and the relationships between those components. [An earlier version was delivered at XTech '99, San Jose, March 1999. [Download .zip of HTML.]
|
Previous years
A Manager's Introduction to XML
Wendell Piez (XML 2000, Washington, D.C.)
A tutorial providing a non-technical introduction to
XML, including its historical origins and its business application.
Also discussed are the XML "family" of standards, i.e., those
standards (XSL, XSLT, XSLFO) related to XML. [Download .zip of PDF.]
XSL: Characteristics, Status and Potentials for the
Humanities
Wendell Piez (2000 Joint Conference of the Association for Computing in
the Humanities and the Association for Literary and Linguistic
Computing, Glasgow)
A conference paper providing an overview of XSL with
reference to applications in Humanities disciplines, particularly as
concerns digital text encoding projects (such as digital libraries) and
Humanities-oriented analytical text processing. [Download .zip of HTML.]
Practical Guide to SGML/XML Filters
Introduction
Debbie Lapeyre
Introduction for Norman E. Smith's book on SGML/XML
Filters (Publisher: Wordware Publishing, Inc., Planto, TX, 1997 [1st
edition], 1998 [2d edition]), noting the value of SGML when combined
with translation programs for output across various media, e.g., print,
voice synthesis. As a prelude to the book's discussion of several
languages for SGML manipulation, the importance of such filters in the
authoring context to enable creation of SGML from diverse sources, such
as desktop publishing tools or spreadsheets, is likewise highlighted.
XML for SGMLers
Tommie Usdin and Debbie Lapeyre (XML'98, Chicago)
A one-day tutorial on the details of XML syntax and
the differences between XML and SGML, highlighting the features,
functionality, and "funkiness" XML excludes. Hands-on instruction
includes emphasis on the changes necessary to convert SGML documents
into well-formed XML, the conversion of SGML DTDs, and the trade-offs
in various conversion approaches. [Download .zip of PDF.]
XML: Not a Silver Bullet, but a Great Pipe Wrench,
Tommie Usdin and Tony Graham (ACM StandardView 6(3):125-132, 1998)
An article discussing the potential uses and benefits
of XML, while questioning whether the excitement surrounding it has been fully merited.
Washington Technologies White Papers
Several early statements (1997) on the business case for SGML/XML. [Download .zip of HTML.]
|
|