Papers and Presentations
by Mulberry Staff
We have given numerous talks on XML, analysis, XSLT and XPath, and SGML at conferences, as well as to industry and user groups in the US, Canada, Asia, Australia, and Europe. We have also published articles in several industry publications. Some of Mulberry’s recent papers, public presentations, and publications (and a few old favorites) are described below.
And we “walk the walk”: our presentations are developed in XML, with either or both PDF and HTML “slide” renditions. Selected descriptions below have links to the slides, sometimes in PDF, sometimes HTML.
2023
The Secret Garden
B. Tommie Usdin (Balisage 2023)
We in the markup community have built ourselves a beautiful and ever-improving place to work. We can move content into markup, we have a variety of tools to manipulate marked-up content, we can move at will from tool to tool, we create a variety of products from that marked up content, and we believe our marked up content will be long lived. We frequently lament that most of the world doesn’t live in our techno-garden, and we occasionally admit that most of the world doesn’t even know it exists. At Balisage this year we will learn about ways in which our technology is improving. We will hear about some of the projects we are doing with markup and some of the problems we are having. And we will hear (a little) about how we are opening the gate to our garden and interacting with the outside world.
PIDs and JATS: How Do I Tag Those Crucial but Pesky Identifiers?
Jeffrey Beck and Deborah Lapeyre (JATS-Con, June 2023)
By a PID (Persistent Identifier or Persistent Unique Identifier) we mean “a string of letters and numbers used to distinguish between and locate different objects, people, or concepts.” The idea is that PIDs will be long-lasting references to resources or objects, mostly digital resources.
Well-known PIDs include DOIs (Digital Object Identifiers), which are used to identify digital objects such journal articles, books, and datasets and ORCiDs (Open Researcher and Contributor IDentifiers) which identify people such as authors and researchers.
As use case examples, PIDs can be used to find: Who exactly is this author? What else have they written? With which institutions are they affiliated? Who is providing the funding? Have they collaborated with any of these authors? Can I access the source data behind this article? PIDs can be used to distinguish between people/institutions/departments with the same name and unite the records of these same people/institutions/departments when that name changes.
We are not going to talk about the economic or strategic advantages of PIDs or why you should use PIDs in your JATS. We will try to be agnostic on which PIDs to use and which repositories and PID infrastructures to favor. We will concentrate on how to record the PIDs you need in your JATS articles.
New PIDs for new situations (or to replace older systems of PIDs) are being devised every day. How do we handle the PIDs we use now? How do we handle the PIDs we hope to add soon? How do we handle new PIDs that are just coming into being? How can we all help fulfill the promise of PIDs to connect and link many worlds, building an “interconnected declarative informational fabric?”
For the types of PIDs we can both imagine and adequately describe, we will provide specific how-this-could-be-tagged in JATS examples. Conversely, we will identify what JATS elements/attributes might be used for which specific identifiers. Where there is more than one valid interpretation, we may discuss alternatives. [Link to the paper]
NISO Update 1: Journal Article Tag Suite (JATS)
Tommie Usdin and Jeffrey Beck (NISO Plus 2023: Global Conversations / Global Connections)
DTD updates, providing the latest news on JATS.
2022
Update from NISO STS
B. Tommie Usdin (Standards Technology Forum, November 2022)
[Link to the presentation, at 4:58]
What’s Happening in JATS, BITS, and STS in 2022
Debbie Lapeyre (eXtyles User Group Meeting 2022)
DTD updates, providing the latest news on JATS, BITS, and STS. [Link to the presentation at XUG 2022 or on Mulberry’s site]
Destructive Consistency
B. Tommie Usdin (Balisage 2022)
It seems to be in the nature of people drawn to discussions of markup, that is, people likely to be at Balisage, that we value consistency. We want both our physical and virtual worlds to make sense and we expect consistency. We chafe when forced to drive on the “wrong” side of the road. We want the names used in our markup vocabularies to be formed in consistent ways — either use CamelCase or don’t, for example. We want things that are parallel to be handled in parallel ways. This feels comfortable to us, and we think we are doing good by pushing the world toward our comfort zone. This, it seems to me, is one of the reasons that explicit markup has not taken the world by storm. We are not meeting people where they are; we are trying to change where they are. Worse, we are (deliberately? frequently?) blind to their styles and desires. This just plain doesn’t work! Our obsession with consistency is destructive.
Thinking about a Convenience Subset of JATS
B. Tommie Usdin (JATS-Con, May 2022)
Open session presentation.
Processing Metadata (A New Feature in JATS 1.3)
Debbie Lapeyre (JATS-Con, May 2022)
Open session presentation.
2021
Update from NISO STS
B. Tommie Usdin (Standards Technology Forum, November 2021)
The JATS Family Grows: The Latest on JATS, BITS, and STS: What’s new and what’s next in the JATS family of DTDs
Debbie Lapeyre (eXtyles User Group Meeting 2021)
The (unspoken) XML gotcha
B. Tommie Usdin (Balisage 2021)
XML is a platform-neutral way to exchange, share, and manipulate information. But what persuades many to use XML is the claim that XML provides a long-term way to store information, independent of tools (both hardware and software) with their short life spans. Projects spend significant resources on XML setup and then settle into doing the real work, using that XML infrastructure to compile, write, analyze, or whatever it is they do. Until, one day — something doesn’t work. Hardware is retired; software is upgraded; specifications go into new releases. Users get stuck. And when they complain, we respond, “Of course that doesn’t work any more; you have been accumulating technical debt for years! It is time to reinvest.” They thought they had committed to a one-time cost, and now we tell them that it is an ongoing expense. If the user had put documents into their favorite spreadsheet, they complain, they could still import them into the current version. How do we answer that complaint? We (the XMLers) think we described the values of XML plainly and fairly. We (the XML users) think that the claim that XML documents last a long time is relying on a specious technicality, and we have been trapped dishonestly. I live on both sides of this: as a user I want to invest in infrastructure once and have it last; as a developer I want to be able to improve my product without the limitations imposed by backwards compatibility. We as a community often complain that not enough people are using XML. If we really want XML use to grow, we need to address the gotcha that too many XML users are feeling.
Encouraging Tag Set Branching without Creating a Briar Patch
B. Tommie Usdin (Markup UK 2021)
Customizing a tag set can be an easy way to get the vocabulary you need. It can also be a a journey filled with dead ends, trap doors, and slowly-revealed and difficult to identify problems. Like many public tag sets, JATS (the Journal Article Tag Suite) was designed to be customized. Our original expectation was that individual users would customize it, and while a few have done that to good effect, we have found that the major customizations have been by groups of users. BITS (the Book Interchange Tag Suite), NISO STS (Standards Tag Suite), and Manuscript Exchange Common Approach (MECA) are widely adopted customizations of JATS.
When users customize a tag set they expect to be able to use the existing infrastructure associated with that tag set, making changes to accommodate the changes they made. They often expect to intermingle their new documents with documents tagged to the original tag set and perhaps with documents tagged to other customizations of the source tag set. They expect this to work gracefully, easily, seamlessly. Sometimes it does, but sometimes it does not!
The “JATS Compatibility Meta-Model Description” was developed to help people who customize JATS create tag sets to create models that will coexist peacefully with existing JATS documents and with documents tagged to other JATS customizations.
It seems unlikely that the particulars of the JATS Compatibility Model will apply to other tag sets, but the principles behind the Meta-Model might be useful to other groups thinking about ways to make their families of tag sets flexible and compatible. [Links to the proceedings (PDF), presentation slides (HTML)]
How Much Tag Set Documentation is Needed? How much is Too Much?
Debbie Lapeyre (Markup UK 2021)
The more the better. Documentation is expensive, stick to the basics. If it isn’t well documented people won’t use it or, worse, won’t use it consistently. JATS (The Journal Article Tag Suite) has documentation. A LOT of documentation. Documentation designed to introduce new users to the tag set. Documentation designed to support experienced users. Documentation to support people who are customizing JATS, including both advice on the mechanics and logic of making customizations. There are definitions, helpful remarks, tagged examples, extended essays. There is an International Standard that meets political needs and a site with non-normative documentation that meets practical needs. There are third party sites advising users on how to use the tag sets for best interoperability, and many organizations that ingest JATS provide (ans may insist on) local rules.
It is entirely possible that this is the most heavily documented XML tag set of all time. Do other tags need this much documentation? Are there any parts of this would others find useful? After a guided tour of the JATS documentation, the audience can chime in: How much documentation does a tag set need? How much documentation does YOUR tag set need? What is useful? What is overkill? What audience most needs to be served? [Links to the proceedings (PDF), presentation slides (HTML)]
A Deep Dive into JATS Documentation
B. Tommie Usdin (JATS-Con, April 2021)
JATS use is supported by a wide variety of information resources. The JATS Standing Committee maintains the JATS Standard and the JATS Tag Libraries. In addition, there are recommendations, conference proceedings, user guidelines, and discussion lists. Similar resources support users of BITS and NISO STS.
The Standards are the de jure definition of the tag sets. Their value to users is mostly in the fact that they exist!!
The Tag Libraries are the de facto definition of the tag sets. They are the information resources users should turn to with questions about structure, usage, and definitions. The tag libraries are a rich and complex reference source. I believe that even regular users will be surprised by some of the information content and navigation tools provided in the tag libraries. The design of the tag libraries has recently been updated to make them more accessible and more responsive to the needs of frequent users.
Other resources include the proceedings of several conferences including JATS-Con, JATS4R Recommendations, the archive of JATS-List, Guidelines provided by major recipients of JATS articles, and tagged articles provided on the web sites of several JATS users. [Link to the paper]
2020
Welcome to Balisage 2020: Everything is the same, and everything is different
B. Tommie Usdin (Balisage 2020)
Balisage 2020 is both totally new and comfortably familiar. Balisage regulars will recognize many of this year’s presenters and welcome some new points of view on familiar topics. Logistically, technologically, we are on a new path. As markup designers, theorists, and practitioners, we are used to tiptoeing near the edge from time to time. I was saddened when I had to admit that Balisage-as-usual could not happen in 2020. I was delighted when it became clear that because we were now virtual, many old friends will be able to re-join us this year, and that this new format will let us welcome some newcomers to the Balisage community.
JATS, BITS, STS: Keeping Things in a “Family” and Backward Compatibility
Jeff Beck and Tommie Usdin (NISO Plus 2020 — The NISO Plus Conference, Baltimore, MD)
XML users are choosing to adopt existing public models instead of building bespoke models, and groups of users are building new community models by extending existing models. Users expect to realize significant benefits when they base a model on an existing, supported public model, including the ability to intermingle the documents in databases, to use tools created for the existing model for their new vocabulary with minimal additional work, and to adopt rendering/formatting applications and change only those aspects specific to the new vocabulary. Based on the presenters’ experience with JATS, BITS, and STS, these benefits are within reach, but only if some discipline is applied to the development and growth of the vocabularies. [Link to the presentation]
Standards Project Updates — JATS, JATS4R, STS, SSOS
Debbie Lapeyre, Melissa Harrison, Robin Dunford, and Robert Wheeler (NISO Plus 2020 — The NISO Plus Conference, Baltimore, MD)
Standards are at the heart of what NISO does, and NISO relies on its community to help decide what is needed and why, to develop new standards, and to update existing ones. As part of a panel representing current NISO working groups and committees, Debbie Lapeyre will provide insights and updates on JATS, JATS4R, STS, and SSOS (see https://www.niso.org/standards-committees for more information on each of these standards). [Link to the presentation]
2019
What is JATS 2.0, and Should You be Worried?
Debbie Lapeyre (eXtyles User Group Meeting 2019, Boston, MA)
Deborah A. Lapeyre, one of the developers of JATS and a member of the JATS secretariat, discusses the JATS Standing Committee’s consideration of a new, backwards-incompatible version of JATS (JATS 2.0) as a long-term possibility for the standard and the reasons for eventually producing a backwards-incompatible version to meet user requests, reduce redundancy and duplication existing presently, and improve processability of JATS documents. [Link to the presentation]
Hands-on Introduction to XML 2019
Deborah A. Lapeyre (XML Summer School, September 2019, St Edmund Hall, University of Oxford)
Three instructors share a 3-day course designed to introduce students to the many and varied aspects of XML. Students experience XML design, processing, and delivery through practical, hands-on classes where they create their own XML documents; validate with DTD, XSD, and RNG; and gain experience with XPath, Schematron, XSLT, XSL-FO, XML in the browser, ontologies, and more. Ms Lapeyre teaches the introductions to XPath and Schematron, beginning XSLT, and slightly more advanced XSLT; and she kibitzes on the other modules.
Explicit markup: a fool’s errand or the next big thing?
B. Tommie Usdin (Balisage 2019, Rockville, MD)
In 1998, at a Balisage predecessor conference, Brian Reid told us we couldn’t have the world we wanted. XML wouldn’t deliver. He used twenty-year-old slides, slides that he had originally presented at a conference in 1981 to make his point. I still want the world that Brian Reid told us we could not have; I still want Brian Reid to have been wrong. I still believe that separating meaning from format will enable our documents to be displayed in many forms and media, that a markup format that makes hierarchy explicit makes complex documents tractable, that when content creators author in systems that make declarative markup visible and use the author’s knowledge to add value to their content, we will be able to make documents sing! And I have the twenty-year-old slides to prove it.
Customizing JATS (Journal Article Tag Suite)
Deborah A. Lapeyre (Symposium on Markup Vocabulary Customization, July 2019, Rockville, MD)
The Journal Article Tag Suite (JATS), also known as ANSI/NISO Z39.96-2019, is a set of tag sets used for journal articles. JATS is used by publishers, archives, search and display tools, and libraries for the encoding and interchange of journal articles. In addition, JATS has been customized to create BITS (the Book Interchange Tag Suite) and NISO STS (ANSI/NISO Z39.102-2017 Standards Tag Suite). After a brief introduction to JATS, we will discuss the mechanisms built in to JATS for customization.
Now I Know what a Dragon Looks Like
Deborah Lapeyre (substituting for B. Tommie Usdin, Markup UK 2019, London)
We are overwhelmed by competing standards, technologies, and approaches to solving problems we may or may not understand and may or may not have anticipated. Each of us individually, and we as a community, have limited resources and want to concentrate our energies where they are most likely to be successful. Selection from among the cornucopia of options is often made more difficult by our preconceived notions of the shape, source, and promulgator of appropriate technologies. In many cases we seek, and occasionally we find, a powerful tool that seems to address all of our problems. The quest for such tools has led many of us to become standards junkies or technology evangelists. The belief that we have created or identified such a tool leads some of us to become missionaries promoting the use of a standard, a paradigm, or even a tool. In the children’s book Everyone Knows what a Dragon Looks Like, illustrated by Mercer Mayer, Jay Williams raises questions about the recognition and appropriate use of powerful tools. From it, we can learn to be little more skeptical of our ability to know the tools we need when we see them. [Link to the presentation]
The Value of Descriptive Markup for Discovery, Utility, and Preservation
B. Tommie Usdin (Typefi User Conference 2019, Baltimore, MD)
Publishers and other content creators are finding themselves pushed towards XML. Many see this as an expense, a burden, yet another technical thing they don’t understand but are obligated to deal with. Part of this is valid: it is new, it can be expensive, and it is in your future. However, XML is not the point, it is a tool. The point is the logic that is encoded in the XML, which does not need to be mysterious and can be the source of significant value.
In this keynote, B. Tommie Usdin argues that the long-term value of XML is not in the XML itself, but in the nature of the markup. By focusing on “descriptive markup” rather than “XML syntax” we can talk about the real value in the push towards interchange of XML documents. It is descriptive markup that allows the identification of document parts by what they are in system-independent ways, and while understanding the details of XML may be a task for specialists, understanding the nature, structure, and parts of documents is a task for publishers. [Link to the presentation]
Introduction to BITS (Book Interchange Tag Suite)
Debbie Lapeyre (XML.com article, January 2019)
Deborah A. Lapeyre, one of the developers of BITS and a member of both the BITS committee and the JATS Standing Committee, introduces us to the BITS tag set for the archiving and interchange of technical books. [Link to the article]
2018
How Does JATS 1.2 (in vote) Differ From JATS 1.1?
Debbie Lapeyre (eXtyles User Group Meeting 2018, Boston, MA)
Deborah A. Lapeyre, one of the developers of JATS and a member of the JATS secretariat, discusses proposed changes to the JATS standard and how JATS 1.2 (still in development) will differ from its predecessor. [Link to the presentation]
Introduction to JATS (Journal Article Tag Suite)
Debbie Lapeyre (XML.com article, October 2018)
Deborah A. Lapeyre, one of the developers of JATS and a member of the JATS secretariat, introduces XML.com readers to the ANSI/NISO standard for the XML interchange of journal articles. [Link to the article]
How JATS empowers scholarly communication
Debbie Lapeyre (SciELO 20 Years: Conference, September 26-28, 2018; São Paulo, Brazil)
What are the advantages of using JATS (Journal Article Tag Suite) for publishing and interchanging journal articles? [Link to the presentation]
XML — Why and how: JATS
Debbie Lapeyre (SciELO 20 Years: SciELO Network Meeting, September 24-25, 2018; São Paulo, Brazil)
What is the Journal Article Tag Suite (JATS) and how does it impact journal publishing and Open Science? [Link to the presentation]
Hands-on Introduction to XML 2018
Deborah A. Lapeyre (XML Summer School, September 2018, St Edmund Hall, University of Oxford)
Three instructors share a 3-day course designed to introduce students to the many and varied aspects of XML. Students experience XML design, processing, and delivery through practical, hands-on classes where they create their own XML documents; validate with DTD, XSD, and RNG; and gain experience with XPath, Schematron, XSLT, XSL-FO, XML in the browser, ontologies, and more. Ms Lapeyre teaches the introductions to XPath and Schematron, beginning XSLT, and slightly more advanced XSLT; and she kibitzes on the other modules.
Trends and Transients
Deborah A. Lapeyre (XML Summer School, September 2018, St Edmund Hall, University of Oxford)
Each year there are more new technologies to keep track of, more ways to organize your life and your company’s information, and more ways to communicate. This all-day session introduces new and potentially over-hyped technologies; discusses older, overlooked technologies; and entertains at the same time. This year for T&T Day, XML Summer School faculty expert speakers will present their views on what is happening now. Deborah Lapeyre will talk about the demise of the bespoke schema for text publishing and the coalescence of schemas in some verticals.
YAMC? Why are we here? Why are we here again?
B. Tommie Usdin (Balisage 2018, Rockville, MD)
There is nothing new about markup, or even generic markup. (I have been working with generic markup for 40 years!) So what is there to talk about after all this time? What are we accomplishing by gathering at Balisage: The Markup Conference? Why do some of us find events like this one valuable? What can you do to make it valuable to you and to the others here? Not only is markup old hat, XML is 20 years old, and some people in the outside world keep trying to tell us that its time has passed.
Groups are still gathering to create shared markup vocabularies in order to enable high quality information sharing. Scholars are using bespoke markup vocabularies to enable them to focus on the works they are reading, interpreting, and writing. Trendy end user displays are being populated by solid maintainable XML content. An ever improving tool set is available to users of marked up documents. We learn from each others’ projects, tools, techniques, and experiences — and enjoy the process!
JATS/BITS/NISO STS
B. Tommie Usdin and Deborah A. Lapeyre (Symposium on Markup Vocabulary Ecosystems, July 2018, Rockville, MD)
The Journal Article Tag Suite is an application of NISO Z39.96-2015, which defines a set of XML elements and attributes for tagging journal articles. BITS, the Book Interchange Tag Suite, and NISO STS, the NISO Standards Tag Suite, are applications of NISO Z39.96-2015 for books and standards. All of the models share a common foundation, customized to meet the needs of specific document types.
Defense of the Lowly Angle Bracket
Deborah Lapeyre (Markup UK 2018, London)
Closing keynote [Link to the presentation]
Shared Tag Sets as Social Constructs
B. Tommie Usdin (Markup UK 2018, London)
At Markup UK we are likely to hear presentations about creation, maintenance, and use of specific tag sets. We are likely to hear more presentations that focus on some aspect of creation, interchange, manipulation, use, or archiving of document content encoded in one or another shared tag sets. In both cases, we will focus on the tag set or the documents, but not on the social forces that shape the tag set and that are shaped by the tag set.
The public tag sets: TEI, DITA, UBL, JATS, HL7, BITS, STS, DocBook, HTML and hundreds (perhaps thousands) of others were created by groups of people to meet specific needs. The needs of those groups, and the changing needs of the various members who have joined or left these activities, have shaped the tag sets. Things the membership, or the sponsors, or the most vocal/powerful members, think are important are enabled and/or emphasized. We are seeing more and more tag sets providing the ability to enhance documents with accessability information as access becomes more important in the world in general.
Conversely, shared tag sets, and the assumptions that underlie them, shape the world view of their users. Before HTML it was common to build tagging structures that allowed paragraphs to contain lists; these days that is considered surprising. Before SGML and XML it was not unusual to discuss how to capture the overlapping structures in texts; now it may be considered a “corner case”, and to many it is simply unthinkable. The communities around various tag sets may pressure users to encode information they do not need in documents because the community expects it: TEI users may feel pressure to provide metadata, JATS users to provide richly encoded citations, HTML users to use more generic tags in place of visually descriptive tags.
Shared tag sets are much more than lists of codes, they are powerful and dynamic social constructs that influence and are influenced by the world. [Links to the proceedings (PDF), presentation slides (HTML)]
Fighting the “Inevitable” Expansion of the JATS Tag Sets
B. Tommie Usdin (JATS-Con, April 2018, Bethesda, MD)
The tag set maintenance process is biased toward expansion and loosening. The JATS Standing Committee wants to satisfy the people who request changes while resisting any suggestions that would be backwards-incompatible. Requests include adding new metadata, enriching the expressive capabilities of the tag suite, accommodating the needs of new document types, and better supporting document display. Models can be tightened to better support interchange through the use of layers on top of the Standard, including guidelines and Schematron published by other players in the community including vendors and groups such as JATS4R and STS4i. We need to support these efforts and to resist pressure to increase the scope of JATS. [Link to the paper]
JATS Mini Tagging Tutorials
Jeffrey Beck, Gerrit Imsieke, Deborah A. Lapeyre, Evan Owens, Laura Randall, and Bruce Rosenblum (JATS-Con, April 2018, Bethesda, MD)
Deborah Lapeyre presented two tutorials:
- Using the Vocabulary Attributes [Link to the tutorial]
- What’s New in JATS 1.2d2 and BITS 2.1, with Bruce Rosenblum [Link to the tutorial]
What is NISO STS?
B. Tommie Usdin (XML.com article, January 2018)
B. Tommie Usdin introduces XML.com readers to NISO STS, the new standard for the XML interchange of standards. [Link to the article]
2017
XPath: The Secret to Success with XSLT, XQuery,
and Schematron (post-conference tutorial)
Debbie Lapeyre (eXtyles User Group Meeting 2017, Cambridge, MA)
XPath is the language for navigating XML documents. XPath is shared by XSLT, XQuery, and Schematron. While beginners can do useful work in XML with even a rudimentary knowledge of XPath, a deeper knowledge of XPath is the key to fully empowering XML tools. In this fast-paced, technical overview of XPath we will cover: tree terminology, the XPath data model, relative and absolute location paths, node tests, functions and operators, and short and long syntax. We will cover most of XPath 1.0, key features of XPath 2.0, and introduce XPath 3.0. [Link to the tutorial]
Just Enough JATS, BITS, and STS (post-conference tutorial)
Tommie Usdin (eXtyles User Group Meeting 2017, Cambridge, MA)
This course is targeted to people who need to make high-level decisions about JATS, BITS, and STS. If you are deciding whether, when, and how to adopt or convert to JATS, BITS, or STS, or if you want to know how they are organized and how they relate to each other, this is the class for you.
The goal of this course is to give people enough working knowledge of JATS, BITS, and STS so they can make informed business decisions and participate fully in decisions about subsetting and customizing.
Starting with a description of the original goals and current uses of JATS, BITS, and STS, we will discuss ways in which they are similar and ways in which they differ, both technically and organizationally. The key design principles are the same for all of these �cousin� tag sets: for example, they are enabling not enforcing. We will discuss the implications of these design principles for production and interchange. These tag sets are also based on the same structural principles � for example, separation of metadata from display content, nested recursive sections, and the ability to do very rich encoding of citations. Because the tag sets are all quite loose, many users find it convenient to subset the model they adopt. We will discuss the reasons for subsetting (and supersetting) the public models and methods of doing so. Finally, we will show the variety of documentation and resources available to users of JATS, BITS, and STS. [Link to the presentation]
Hands-on Introduction to XML
Deborah A. Lapeyre (XML Summer School, September 2017, St Edmund Hall, University of Oxford)
Built around the “real life” scenario of Erasmus Swift, a new age philosopher who decides to build a website using XML technology, this 3-day course is designed to introduce students to the many and varied aspects of XML design, processing, and delivery through practical, hands-on classes where they create their own XML documents. Students learn how to create data structures using an XML editor, create an XML schema model, and parse/validate the document structure. They also have the opportunity to gain an understanding of the latest XML tools and technologies in the marketplace, so that they can develop and implement their own XML solutions.
It is time to make ourselves clear
B. Tommie Usdin (Balisage 2017, Rockville, MD)
We, the markup community, have for too long pussy-footed around in a misguided effort to get along with the unenlightened. We have compromised, equivocated, and taken one thing after another into consideration. That time is over. It is time for us to insist that the world straighten up and fly right. To stand up and put our collective feet down! Start marking up documents with explicit tags, no more of this word-processor hide-the-markup stuff. Separate content from format! Make all publications accessible! Enable interoperability! We know what’s right; let’s do it and demand that others do, too!
Well, if they don’t mind. And if they can afford it. And if it won’t break any current systems, and nobody is offended. Of course.
Optimization Panel
Abel Braaksma, John Lumley, Adam Retter, and Tommie Usdin (Balisage 2017, Rockville, MD)
Panelists discussed how to optimize XMLfor interchange and interoperability. And, while they’re at it, reduce the file size, increase the readability, future-proof it by making sure it conforms to all applicable standards now and forever, etc. Conference participants chimed in with questions, opinions, and counter-examples on wide-ranging topics such as: Premature optimization is the root of all evil, yes, but what exactly is premature? What is the expected gestation period for optimization? Are we optimizing for file size, processing speed, retrieval speed, loading speed, longevity of data, ease of comprehension without having to check the manual to discover that “pglg” means “programListing”? Are we optimizing our XML, our XSLT, our XQuery, our XProc, or something else? By the end of the discussion, optimization no longer seemed quite as simple or straightforward as it did before — but attendees were able to do a much better job of it.
Bespoke, Bewildered, and Bebothered
Deborah Lapeyre (XML London 2017, London)
Closing keynote on the evolution of XML vocabulary design
Oxygen for training classes
B. Tommie Usdin (Oxygen Users Meetup, May 2017, Rockville, MD)
[Link to the presentation]
In pursuit of family harmony: Introducing the JATS Compatibility Meta Model
B. Tommie Usdin, Deborah A. Lapeyre, Laura Randall, and Jeffrey Beck (JATS-Con, April 2017, Bethesda, MD)
JATS is an Open Standard. Users may modify it by adding or removing elements and attributes to suit their needs. Some publishers have extended (added to) JATS based on their own requirements. And there are some public extensions like BITS, STS, and Taxpub. Users expect significant efficiencies from vocabularies based on JATS, including the ability to intermingle the documents in databases, to use tools created for JATS for their new vocabulary with minimal additional work, and to adopt rendering/formatting applications and change only those aspects specific to the new vocabulary. Some model changes create compatible documents, which can interoperate with JATS documents gracefully. But some model changes are disruptive. We discuss what types of changes to the JATS models can be integrated into existing XML environments and which may be disruptive. We propose a set of criteria to evaluate whether a proposed change will be seamless or might cause problems. [Link to the paper]
Circling in on the JATS Compatibility Meta-Model
B. Tommie Usdin, Deborah A. Lapeyre, Laura Randall, and Jeffrey Beck (JATS-Con, April 2017, Bethesda, MD)
The JATS Meta-Model was developed to guide people who want to customize JATS to meet local needs and have their JATS-based vocabularies work gracefully with existing JATS-based infrastructure. From analyzing content models to defining “social behaviors” of XML elements, the process of defining the JATS Compatibility Meta-Model was rarely straightforward and very often led us to surprising conclusions. Why, for instance, is whether or not something is metadata not a defining property of compatibility? This paper aims to explain the process and thinking behind the model — how we came to the conclusions about compatibility and what we even mean by compatibility. We’ll look at some of the assertions we started absolutely knowing to be important, and discuss why they’re ultimately not in the Meta-Model. By examining the process behind the model and sharing our successes and failures, we hope to improve understanding of the model and its broader implications. [Link to the paper]
XML for Standards: Options and Opportunities
B. Tommie Usdin (XML for Standards Publishers: A NISO Connections Live Event, April 2017, Washington, DC)
[Link to the presentation slides at NISO or presentation slides at slideshare.net]
2016
Hands-on Introduction to XML
Debbie Lapeyre (XML Summer School, September 2016, St Edmund Hall, University of Oxford)
This 3-day course is designed to introduce a student to aspects of XML basic principles, design, tagging, processing, Quality Assurance, and web delivery. This practical tutorial is taught as a series of hour and a half segments: each segment begins with a one hour lecture and ends with a half hour for hands-on exercises in the topic just discussed. Ms. Lapeyre wrote and teaches the lecture and exercises for both the two XSLT segments and the Schematron segment, as well as assists with the XML schema segments (DTD, XSD, and ReLAX NG).
Graceful tag set extension
B. Tommie Usdin, Deborah A. Lapeyre, Laura Randall, and Jeffrey Beck (Balisage 2016, North Bethesda, MD)
Tag Sets, or XML Vocabularies, are often created from other Tag Sets or Vocabularies. Users expect significant efficiencies from using derived or “based on” vocabularies, including the ability to intermingle the documents in databases, to use tools created for the original Tag Set with minimal additional work, and to adopt rendering/formatting applications and change only those aspects specific to the new vocabulary. Some model changes create compatible documents, which can interoperate with documents tagged to the source specification gracefully. Some model changes are disruptive. We discuss what types of changes can be integrated into existing XML environments and which may be disruptive.
So You Want to Adopt JATS. What Decisions Do You Need To Make?
B. Tommie Usdin (JATS-Con, April 2016, Bethesda, MD)
Newcomers to JATS need to make decisions about which tag set to use (Authoring, Publishing, or Archiving), which table model to adopt, and how to handle math. In addition, they should consider citation model and style, contributor names and affiliations, alternative languages and encodings, and adoption of tagging guidelines from PMC, JATS4R, and/or their publishing partners. [Link to the paper]
Citing Data with JATS
Deborah A. Lapeyre (Force11 Data Citation Implementation Pilot (DCIP) Project Kick-Off Workshop, February 2016, Boston, MA)
Data needs to be cited as a primary resource. The ANSI/NISO Z39.96-2015 Journal Article Tag Suite (JATS) is a widely used XML tag set for marking up journal articles. After a brief introduction to JATS, the presentation focused on how to cite data within JATS-tagged journal articles. Noting Force11’s recommendation that data be cited as bibliographic references, the talk explained JATS’ “mixed” citation element and its constituent elements that can be used to cite data. New elements were added to JATS at the request of Force11’s to make better data citations. JATS’ limitations vis-a-vis machine resolvability were also discussed. The talk concluded with examples of JATS data citations for the Dryad Digital Repository, a GenBank Protein, an RNA Sequence, Figshare data, etc. An appendix with mapping of data fields to JATS elements is also included. [Link to the presentation slides (titled “Citing Data in Journal Articles using JATS”]
2015
Manipulating XML Content: The Concepts and Practice of XSLT (post-conference tutorial)
Debbie Lapeyre (eXtyles User Group Meeting 2015, Cambridge, MA)
XSLT is comparatively easy transformation tool that turns XML into “something else”, giving you the flexible single-source publishing you want. If you have XML, you need XSLT!
Using XSLT you can: convert XML files into display formats (such as HTML and eBook); make XML into tool-specific formats (such as typesetting languages); extract just a little of your XML for a catalog, report, or shopping cart; and in the process automatically add numbering, cross-reference links, tables of contents, and generated text. XSLT can convert documents tagged according to your XML into documents tagged according to someone else’s tag set.
XSLT is a programming language (sounds scary), but you can read it, write it, and understand it WITHOUT being a programmer. If you are a programmer, you need to know that XSLT is different but not difficult. If you’re a manager, you need to know what XSLT is good for. XSLT is easy to learn if you start with the data model and the processing model behind it. This introduction discusses the principles that underlie XSLT, introduces the data and processing models; demonstrates some XSLT transformations to highlight the sorts of things it is good for; and describes how XSLT is being used in a variety of environments.
Hands-on Introduction to XML
Debbie Lapeyre (XML Summer School, September 2015, St Edmund Hall, University of Oxford)
This 3-day course is designed to introduce a student to aspects of XML basic principles, design, tagging, processing, Quality Assurance, and web delivery. This practical tutorial is taught as a series of hour and a half segments: each segment begins with a one hour lecture and ends with a half hour for hands-on exercises in the topic just discussed. Ms. Lapeyre wrote and teaches the lecture and exercises for both the two XSLT segments and the Schematron segment, as well as assists with the XML schema segments (DTD, XSD, and ReLAX NG).
The art of the elevator pitch
B. Tommie Usdin (Balisage 2015, North Bethesda, MD)
Many of us at Balisage feel that the universe (or our organization, sponsor, client, or mother-in-law) doesn’t sufficiently appreciate or respect technologies we know could significantly improve the world. XSLT, techniques for processing overlap, DITA, XQuery, HTML5, even XML, are not given the attention they deserve. People aren’t listening! This is our fault, at least in part. We as a community need to learn to say less and communicate more, and more persuasively. [Abstract only]
Including Data in the Standardized Markup for Journal Articles (NISO JATS)
Deborah A. Lapeyre (Dataverse Workshop on Common Models and APIs for Data Publishing and Citation, June 2015, Cambridge, MA)
The ANSI/NISO Z39.96-2015 Journal Article Tag Suite (JATS) is a widely used XML tag set for marking up journal articles. New Force 11 DataCite work adds new elements to JATS, making it easier than before to cite data sources. Several different types of data and datasets are tagged as examples. Since JATS is a descriptive rather than prescriptive tag set, there are always multiple ways to tag any one construct. An appendix illustrates potential mappings of data citation fields to JATS elements. [Link to the presentation slides (titled “Citing Data in Journal Articles using JATS”]
Superimposing Business Rules on JATS
B. Tommie Usdin, Deborah A. Lapeyre, and Carter M. Glass (JATS-Con, April 2015, Bethesda, MD)
Publishers are stuck between a rock and a hard place. They want to use JATS for interchange but they want their model to help them maintain consistency and enforce their business rules, which JATS does not. We suggest a Schematron layer so they can have it both ways without having multiple models (a notion many publishers find confusing) or needing to transform their content on export (which many content creators find terrifying). [Link to the paper]
What’s New in JATS since 1.0? (tutorial)
B. Tommie Usdin and Deborah Lapeyre (JATS-Con, April 2015, Bethesda, MD)
There have been a number of updates made to the JATS article models since NISO Z39.96 was released officially in August of 2012. These updates comprise three Committee Draft releases: 1.1d1, December 2013; 1.1d2, December 2014; and 1.1d3, anticipated in Spring 2015.
We discuss in detail the changes specified in these three Committee Drafts (which we are expecting to be in the next official release). New capabilities include:
- Affiliation identifiers
- Additional locations for some metadata elements like <abstract> and <kwd-group>
- Ruby tagging
- A new parameter entity to make it easier to add global attributes (which now includes a global @id and xml:base attributes)
- MathML 3
- <code> element
- NISO Access and License Indicators (ALI) elements <free_to_read> and <license_ref>
- Structures for citing data
In addition to discussing the new capabilities we will provide examples of each in use and answer questions about how they could/should be used.
2014
Schematron for QA and Reporting (post-conference tutorial)
Debbie Lapeyre (eXtyles User Group Meeting 2014, Cambridge, MA)
Schematron is a language for Quality Assurance and for ad hoc reporting on XML documents and collections of XML documents. With Schematron, users can identify all documents or portions of documents that have, or don’t have, a particular structure, value, or pattern. For example, a user can find all of the documents with sections that lack titles (or have empty titles), list all of the documents that have more than 25 figures, or show all of the figures that were never referenced. Schematron can identify things that may be valid XML but that are often incorrect, or at least worth examination. Schematron can be used for reporting as well; for example, how many of the authors in a particular journal are members of the sponsoring society, or which articles in an encyclopedia have no citations less than five years old.
Even if you already use a DTD or XSD, Schematron can provide a fast, customizable means of validating your XML even further, with pinpoint reporting of values and value ranges, as well as the presence or absence of elements and attributes, while also checking co-constraints and other hard-to-crack edge cases.
Hands-on Introduction to XML
Debbie Lapeyre (XML Summer School, September 2014, St Edmund Hall, University of Oxford)
Designed to introduce the many and varied aspects of XML design, processing, and delivery, the three-day course offers the opportunity to gain an understanding of the latest XML tools and technologies in the marketplace. In hands-on exercises, students learn about content marked up in XML, validation using XML schemas, transformation with XSLT, output pagination with XSL-FO, and searching with XPath and XQuery. Additional topics include transferring structured data between applications, metadata and knowledge in XML (the Semantic Web), and service oriented architectures (web services).
When the “One Size Fits Most” tagset doesn’t fit you
B. Tommie Usdin (JATS-Con, April 2014, Bethesda, MD)
JATS does not actually claim to be a “one size fits all” specification. However, many information content consumers (libraries, archives, on-line services) accept only content that is valid to one of the JATS models, and in many cases specify a subset of the model defined in one of the JATS instantiations (Archiving, Publishing, or Authoring). Thus, content creators find that their vendors and tools often assume that they will be using one of the JATS models “out of the box”. This can present a real problem when a publisher has, and wants, information that is not modeled in JATS, or is not modeled in the JATS DTD their vendors and publishing partners require. In this case, the publisher has several options: Drop the inconvenient information; use “Custom Metadata”, hide the inconvenient information in prose, abuse a tag, suggest a modification of the standard, or modify the tag set to encode the information that matters to you. None of these options are ideal, and which to choose in large part depends on circumstances. [Link to the paper]
Introduction to the Book Interchange Tag Suite (tutorial)
B. Tommie Usdin and Deborah Lapeyre (JATS-Con, March 2014, Bethesda, MD)
The Book Interchange Tag Suite is an XML model for STM books that is based on the Journal Article Tag Suite (JATS; ANSI/NISO Z39-96-2012). The intent of the BITS is to provide a common format in which publishers and archives can exchange book content, including book parts such as chapters. The Suite provides a set of XML schema modules that define elements and attributes for describing the textual and graphical content of books and book components as well as a package for book part interchange.
This half-day detailed “Introduction to the Book Interchange Tag Suite” will cover the major differences between JATS and BITS and discuss the book-specific features of the BITS. Among the topic covered will be:
- Books, book parts, and collections. BITS provides mechanisms for encoding complete books, portions of books such as chapters, and collections of books such as monographic series. The metadata for each identifies the document at the appropriate level, and metadata is used to define collections of book parts and of books.
- XInclude in BITS. Books are often too large to manage comfortably as single files. XInclude is an XML mechanism that allows large documents to be managed as a set of relatively small files and yet processed as one large logical entity.
- Table of Contents. Books and parts of books often have complex Tables of Contents which may or may not be simple compilations of headings of the structures in the material. BITS allows Table of Contents as tagged structures and allows Table of Contents information to occur throughout the material for compilation on display.
- Indexes. BITS enables two methods for tagging index information. Indexes can be tagged as structures that point into the document, or index information can be embedded in the document and compiled on display.
- Moving JATS articles into BITS. BITS is designed to make it as easy as possible to move a stand-alone JATS-encoded journal article into a book. For example, if a set of journal articles are collected as chapters of a book.
2013
JATS and BITS Update
Debbie Lapeyre (eXtyles User Group Meeting 2013, Cambridge, MA)
Exciting News! The Journal Article Tag Suite (JATS) has just become an ANSI/NISO standard (ANSI/NISO Z39.96:2012). This latest release of JATS permits users to choose MathML 2.0 or MathML 3.0 for their documents. Progress has also been made in internationalization for JATS, including such key features as Ruby tagging and the ability to describe the name of an author or an institution in more than one language or script. Users were invited to join the JATS effort and submit questions and requests through the NISO comment site. The Tag Library documentation has also been updated and the new look and feel was demonstrated.
In addition, there has been great progress in a JATS-based tag set for books, called Book Interchange Tag Suite (BITS). BITS is not yet at NISO, still an National Library of Medicine (NLM) initiative. BITS is an STEM XML tag set intended for use by publishers who are already tagging their journal articles with JATS. All the lower-level structures (paragraphs, figures, equations, tables, etc.) are the same in JATS and BITS, so a journal article could be easily transformed into the chapter of a book by adding little book-specific metadata. BITS adds non-journal elements such as Indexes, Tables of Contents, questions and answers, and series metadata to JATS. BITS can be used for an entire book or to tag just a chapter. [Link to the presentation slides]
Hands-on Introduction to XML
Debbie Lapeyre (XML Summer School, September 2013, St Edmund Hall, University of Oxford)
Designed to introduce the many and varied aspects of XML design, processing, and delivery, the three-day course offers the opportunity to gain an understanding of the latest XML tools and technologies in the marketplace. In hands-on exercises, students learn about content marked up in XML, validation using XML schemas, transformation with XSLT, output pagination with XSL-FO, and searching with XPath and XQuery. Additional topics include transferring structured data between applications, metadata and knowledge in XML (the Semantic Web), and service oriented architectures (web services).
The semantics of “semantic”
B. Tommie Usdin (Balisage 2013, Montréal)
There was a time when I knew what the word “semantic” meant. That was a long time ago. Since then many people, on many occasions, in many contexts, have corrected my misunderstanding of the meaning of semantic. Perhaps it means nothing, or everything. Or perhaps I’m simply misinformed.
JATS: A New Standard from an Old Specification
Jeffery Beck and B. Tommie Usdin (Information Standards Quarterly, Spring 2013, 25(1): 19-21)
The Journal Article Tag Suite (JATS) is a description of a set of elements and attributes that is used to build XML models of journal articles for archiving, publishing, and authoring. JATS became an American National Standard (ANSI/NISO Z39.96-2012) in August 2012, but it was already a well established specification (known by the colloquial name “NLM DTD”) by the time work began on standardization in late 2009. [Link to the paper]
2012
BITS — JATS for Books
B. Tommie Usdin and Deborah A. Lapeyre (Mulberry’s Seminar Series)
The National Library of Medicine has just announced the public availability of a new XML model: BITS (Book Interchange Tag Suite). BITS, now in draft for public comment, has been designed to meet the needs of publishers who are using JATS for journal articles and want to process their books in similar XML. JATS (the Journal Archiving Tag Suite) has been widely adopted for the XML encoding and interchange of journal articles, and has recently become an ANSI/NISO Standard (ANSI/NISO Z39.96-2012). The JATS models work well for much of the body of books, but there are significant differences between journal articles and books, which the new BITS model accommodates. This seminar will discuss the unique features of books that are modeled in BITS.
Mapping JATS to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies
Silvio Peroni, Deborah Lapeyre, and David Shotton (JATS-Con, October 2012, Bethesda, MD)
We will present a mapping of the metadata and bibliographic references from the Journal Article Tag Suite (JATS) to RDF, using the SPAR (Semantic Publishing and Referencing) ontologies together with elements from other well-known vocabularies. This mapping will permit XML documents marked up using JATS to be converted automatically to RDF, enabling the information contained within those documents to be published to the Semantic Web in a manner that is (hopefully) unambiguous and universally understood. By so doing, we hope to facilitate the publication of bibliographic information on the web as linked open data and to enhance the toolkit for libraries, archives, and publishers who have chosen to encode their journal material in NISO JATS. [Link to the paper]
Introduction to Schematron (tutorial)
Deborah Lapeyre (JATS-Con, October 2012, Bethesda, MD)
This three-hour tutorial discusses Schematron, a rules-based validation/reporting language that works by making assertions about patterns found in XML documents and reporting back messages about the truth (or otherwise) of those assertions. Whether you are using XSD, DTD, or RELAX NG, there are some validations that those grammar-based schema languages just cannot express, or which, for practical or business reasons, you do not want to build into your basic XML models. Schematron can supplement your schema validation with targeted reporting on elements and attributes, testing their presence, absence, values or value ranges, checking co-constraints and other tricky situations, and warning about suspect occurrences that require further examination. To express its rules, Schematron relies on XPath, the tree-walking and expression language used with XQuery and XSLT.
Hands-on Introduction to XML
Debbie Lapeyre (XML Summer School, September 2012, St Edmund Hall, University of Oxford)
Designed to introduce the many and varied aspects of XML design, processing, and delivery, the three-day course offers the opportunity to gain an understanding of the latest XML tools and technologies in the marketplace. In hands-on exercises, students learn about content marked up in XML, validation using XML schemas, transformation with XSLT, output pagination with XSL-FO, and searching with XPath and XQuery. Additional topics include transferring structured data between applications, metadata and knowledge in XML (the Semantic Web), and service oriented architectures (web services).
Things change, or, the “real meaning” of technical terms
B. Tommie Usdin (Balisage 2012, Montréal)
Vocabulary is slippery, especially the sorts of technical jargon we are immersed in at events like Balisage. When we want to talk about a new idea, process, specification, or procedure we have two choices: make up a new word or use a word that is already in use to mean something else. New words may be difficult to remember and awkward to use. Re-purposing an existing word may cause confusion between the “old” and your “new” meaning. In either case, usage of terms changes. The usage of a technical term may mutate over time and may evolve differently in different communities. At times it is useful for a community to pressure users to use terms to mean what they meant when coined, but more often it is simple pedantry to insist that any usage other than that of the person who first introduced the term is incorrect. Our challenge is in finding that balance.
Luminescent: Parsing LMNL by XSLT upconversion
Wendell Piez (Balisage 2012, Montréal)
Among attempts to deal with the overlap problem, LMNL (Layered Markup and Annotation Language) has attracted its share of attention but has also never grown much past its origins as a thought experiment. LMNL's conceptual model differs from XML's, and by design its notation also differs from XML's. Nonetheless, a pipeline of XSLT transformations can parse LMNL input and construct an XML representation of LMNL, with the resulting benefit that further XML tools can be used to analyze and process documents originating from the alien notation. The key is to regard the task as an upconversion: structural induction performed over plain text.
Data Modeling in the Humanities: Three Questions and One Experiment
Wendell Piez (Workshop on Data Modeling and Knowledge Organization in the Humanities, March 2012, Brown University, Providence, RI)
Thinking about data modeling in the humanities leads directly to paradoxical questions regarding digital data, textual media, and their proper or possible relations in a system of representation. Rather than answer these questions directly, Wendell Piez poses three more. What do we mean by “data model”, and in particular, how can a data model be designed to support processes and methods that must be underspecified insofar as they are protean, contested, responsive to exigencies, and themselves objects of investigation? What about markup, and what is the relation of our data model to markup technologies? And what is the potential role of the schema, as an instrument of operations and transformations that can enable open-ended and experimental work? In order to help explore these issues, Wendell will demonstrate a prototype toolkit, parsing a markup syntax capable of representing arbitrary overlapping ranges and providing them with structured annotations. [Link to the presentation slides]
2011
Introduction to Multi-language Documents in NISO JATS
Deborah A. Lapeyre (JATS-Con, September 2011, Bethesda, MD)
The current JATS includes several structures that support encoding documents in which (some) metadata or text is provided in more than one language. In addition to the practically ubiquitous xml:lang attribute, there are elements specifically designed to contain multiple languages. Many metadata elements have been made repeatable and given an xml:lang language attribute, so that they can be present in the metadata once for each language. This introduction will explain the mechanisms for marking up multi-language content as well as examples of their use. You will learn how to encode an author's name in several languages (or language/script combinations) without creating the false impression that these variations represent additional authors. You will learn how to encode a table or a figure in several languages. Tagged metadata examples will illustrate the use of translated abstracts, titles and subtitles, keywords, and journal titles. Citation examples will illustrate multiple sources, reference authors in several scripts, and more. [Link to the paper]
Taming the Beast: JATS data, non-JATS data, and XML Namespaces
Wendell Piez (JATS-Con, September 2011, Bethesda, MD)
An introduction to basic concepts, gotchas, and rules of thumb for working with namespaces in JATS documents and processing systems, addressing questions including the following: What are namespaces in XML, why do I need them, and why am I so confused? What can I do about this? Can I avoid namespaces altogether? (Yes, sometimes, but mostly no, not in the real world.) If I can't avoid them, how do I work with them, and what practices do I follow so as to understand what's going on in my data, recognize and fix problems when they arise, and prevent them from ever arising? What are the rules of good namespace hygiene? [Link to the paper]
The future of the JATS: the probable, the possible, and the unlikely
B. Tommie Usdin (JATS-Con, September 2011, Bethesda, MD)
In the year since JATS-Con 2010 the JATS has made one major step (becoming a draft NISO Standard) and many small steps (changes to the Tag Suite itself). This is a good time to think about the future of the JATS. Will it, and should it, be basically stationary from now on? Will it be gracefully and gradually extended? What sorts of changes are probable? Are there revolutionary changes we should be thinking about? [Link to the paper]
Hands-on Introduction to XML
Debbie Lapeyre (XML Summer School, September 2011, St Edmund Hall, University of Oxford)
Designed to introduce the many and varied aspects of XML design, processing, and delivery, the three-day course offers the opportunity to gain an understanding of the latest XML tools and technologies in the marketplace. In hands-on exercises, students learn about content marked up in XML, validation using XML schemas, transformation with XSLT, output pagination with XSL-FO, and searching with XPath and XQuery. Additional topics include transferring structured data between applications, metadata and knowledge in XML (the Semantic Web), and service oriented architectures (web services).
Serendipity
B. Tommie Usdin (Balisage 2011, Montréal)
Conferences are ostensibly structured around presentations, papers, and posters; and these are key to the success of any conference. By common agreement, the informal aspects of conferences — lunch, coffee breaks, and overheard conversations — are of lesser importance. Balisage produces persistent electronic proceedings, but the conference itself is face to face only. It is human interaction and serendipity that provide the most valuable, You are likely to attend both presentations you know yourself to be interested in and talks on topics you know little about. You may expand your areas of interest; you may learn something useful. A talk about a topic totally foreign to you may prompt you to think of a solution to one of your current problems or a new approach to a long-standing problem. With luck, you will also make a few new friends, connect with some old ones, make a helpful suggestion to a fellow participant, and have some fun in Montreal.
Generic microformats for coverage, comprehensiveness, and adaptability
Wendell Piez (Balisage 2011, Montréal)
The major descriptive XML formats for publishing applications all have an Achilles' heel: their means of achieving breadth of coverage and adapting to local requirements. Many projects avoid schema extensibility mechanisms, which fork the local application from the core tag set, complicating implementation, maintenance, and document interchange and thus undermining many of the advantages of using a standard. Yet the easy alternative, creatively reusing and abusing available elements and attributes, is even worse: it introduces signal disguised as noise, degrades the semantics of repurposed elements and hides the interchange problem without solving it. This dilemma follows from the way we conceive of our models for text. If designing an encoding format for one work must compromise its fitness for any other, we will always be our own worst enemies. Reconsidering our approach to descriptive encoding, we can see a solution: supplement our current mechanisms with abstract generic elements designed specifically to support extensibility not in the schema but in the document instance, providing for bottom-up development, as microformats, of new semantic types.
2010
Fitting the Journal Publishing 3.0 Preview Stylesheets to Your Needs: Capabilities and Customizations
Wendell Piez (JATS-Con, November 2010, Bethesda, MD)
An introduction to the NCBI/NLM Journal Publishing 3.0 Preview XSLT stylesheets, which provide for basic styled display of Journal Publishing 3.0 data, in HTML and PDF, with an emphasis on features enabling extension and customization. With demonstrations. [Link to the paper]
Why Create a Subset of a Public Tag Set
Debbie Lapeyre (JATS-Con, November 2010, Bethesda, MD)
The Journal Article Tag Sets were designed as translation targets; they are permissive, descriptive rather than prescriptive, and use escape hatches to preserve as many semantics as possible in born-digital XML content that originates in another tag set. This means that the Tag Sets — which can describe “almost” anything for “almost” anybody — can be used right out of the box, and many users do just that. But for a publisher (particularly a publisher looking to move XML to earlier stages in a workflow) or for an archive with requirements to regularize archival content, the advantages to subsetting can be substantial. The benefits of subsetting the Tag Sets are discussed, e.g., the ability to leave documents valid to one of the original NLM Tag Sets while at the same time enabling business-specific reporting, Quality Assurance, and XML tool use. [Link to the paper]
XML Summer School (September 2010, St Edmund Hall, University of Oxford)
Debbie Lapeyre
Hands-on Introduction to XML: Through the Looking Glass (Lesson 2.2, an introduction to XSLT and XPath)
An introduction to transformation, the third key component of XML technology (the other two being the XML Language itself and XML Schemas). Following explanations of XSLT’s basic concepts and the XSLT processing model, attendees learn how to (1) use XSLT to transform source XML structures to produce XML and non-XML output and (2) use the XPath standard with XSLT to locate nodes in the XML source.
Hands-on Introduction to XML: Transformers, transform! (Lesson 2.3, transformations in action)
An overview of the more advanced features of XSLT (including some introduced in XSLT 2.0). Following a discussion of XPath location paths, attendees learn how to create XSLT stylesheets using “push” and “pull” approaches. Topics include those relevant to programmers, such as copying nodes from the tree; conditional processing; and variables, parameters, and named templates.
The high cost of risk aversion
B. Tommie Usdin (Balisage 2010, Montréal)
Avoiding risk is not always the way to minimize risk.
Creation of File Formats (in “Simplifying Digital Content: Standards from Creation to Distribution and Access”)
B. Tommie Usdin and Jeff Beck (NISO Update at 2010 ALA Annual Conference, Washington DC)
The goal of the Standardized Markup for Journal Articles Working Group (see www.niso.org/workrooms/journalmarkup/) is to take the currently existing National Library of Medicine (NLM) Journal Archiving and Interchange Tag Suite version 3.0, the three journal article schemas, and the documentation and shepherd it through the NISO standardization process. In April, the group finished working on updates to version 3.0 and began moving to authoring the standard itself. Participants learned more about their work, as well as about discussions that had taken place in NISO about next steps, including potential work to develop a Book DTD.
2009
The National Library of Medicine Tag Suite for Journal Articles: Taking Over the World of XML Journal Publishing
Debbie Lapeyre (XML-in-Practice 2009, Washington, D.C.)
The Nation Library of Medicine’s publicly available “NLM Journal Article Archiving and Interchange Tag Suite” has taken over the world of XML journal publishing. The journal tag sets made from the Suite are better than DocBook for extensive references; more targeted than DITA and TEI to journal articles; flexible enough for content beyond the STM world; and easily adaptable for eBooks. It’s free. It’s customizable. What do the National Library of Medicine, Library of Congress, the British National Library, JStore (Portico), and numerous journal publishers know that you should know? What is the Tag Suite and why is it the de facto journal article XML worldwide? [Downloadable .zip of PDF]
XML Summer School (September 2009, St Edmund Hall, University of Oxford)
Debbie Lapeyre
Hands-on Introduction to XML: Through the Looking Glass (Lesson 2.2, an introduction to XSLT and XPath)
An introduction to transformation, the third key component of XML technology (the other two being the XML Language itself and XML Schemas). Following explanations of XSLT’s basic concepts and the XSLT processing model, attendees learn how to (1) use XSLT to transform source XML structures to produce XML and non-XML output and (2) use the XPath standard with XSLT to locate nodes in the XML source.
Hands-on Introduction to XML: Transformers, transform! (Lesson 2.3, transformations in action)
An overview of the more advanced features of XSLT (including some introduced in XSLT 2.0). Following a discussion of XPath location paths, attendees learn how to create XSLT stylesheets using “push” and “pull” approaches. Topics include those relevant to programmers, such as copying nodes from the tree; conditional processing; and variables, parameters, and named templates.
Standards considered harmful
B. Tommie Usdin (Balisage 2009, Montréal)
Standards and shared specifications allow us to share data, build general purpose tools, and significantly reduce training and customization costs and startup time. That is, the use of appropriate specifications can help us reduce costs, reduce startup time, and increase quality, usability, and reusability of content. Some vigorous standards proponents insist that the more standards used the better. To them I say “mind your own business and let me mind my own store”. They argue that using standards is always the right thing to do, because it enables re-use and interchange. Maybe so. But adoption of a standard that supports an activity that is not central to your mission is a distraction, an unwarranted expense, a bad idea.
How to Play XML: Markup Technologies as Nomic Game
Wendell Piez (Balisage 2009, Montréal)
Projects involving markup technologies are game-like: they have players (teams and individuals), equipment, rules, victories, and defeats. In many of the markup games we play, the making of the game’s rules is part of the game itself. When the playing of a game involves the modification of the game’s own rules, it is said to be a “nomic game”. The process of legislation, for example — including the collaborative development of markup vocabularies and other markup standards — is a nomic game. This meditation considers how the experiences of earlier nomic games are influencing today’s contests, the far-reaching influence today’s nomic games will exert on those to be played later, and things to consider as we engage each other in the nomic games of markup theory and practice.
Summer XML 2009 Conference (July 2009, Raleigh, North Carolina)
Debbie Lapeyre
Introduction to Schematron
Schematron is a small, powerful, and lightweight fact-checker for XML documents. It offers the best error messages in the world — you write them yourself. Whether you are using XSD, DTD, or RELAX NG, there are some validations that those grammar-based schema languages just can’t express, or which, for practical or business reasons, you do not want to build into your basic XML models. Schematron offers a practical way to reach into these corners. Schematron can supplement your schema validation with targeted reporting on elements and attributes, testing their presence, absence, values or value ranges, checking co-constraints and other tricky situations, and warning about suspect occurrences that require further examination. To express its rules, Schematron relies on XPath, the industry-standard query syntax for data retrieval and linking within and among XML documents. This makes it a natural fit with other applications in the XML family of technologies, including XSLT and XQuery; eases development and maintenance; and rewards your organization’s investment in XML expertise with a higher quality product. This session is a presentation, discussion, and demonstration using real-world data, suitable for newcomers to XML-based document production as well as for editors, production staff, and technologists more experienced with XML.
Introduction to XSLT Concepts
You keep hearing that XML is exciting; that once you have your content in XML you can do anything with it; that XML is powerful and flexible. Then you look at an XML file and don’t see what the fuss is all about! XSLT (the XML transformation language) is the language that makes XML powerful and flexible. Using XSLT you can: convert XML into display formats (HTML, PDF, etc.); make XML into tool-specific formats (such as typesetting languages); and automatically add numbering, cross-references, tables of contents, and generated text to your pages. You can also use XSLT to convert documents tagged according to your DTD/schema into documents tagged according to someone else’s tag set! XSLT changes the way you’ll think about XML. This introduction covers the principles of XSLT, its processing model, what it can and can’t do, and how it is being used in real environments. In this brief hands-on tutorial you will run, and then modify, sample XSLT Transforms to illustrate the power of XSLT.
2008
Introductory Schematron
Wendell A. Piez (XML-in-Practice 2008)
This tutorial discusses Schematron, a rules-based validation/reporting language that works by making assertions about patterns found in XML documents and reporting back messages about the truth (or otherwise) of those assertions. While Schematron can work with many tree-querying languages, the tutorial illustrates Schematron as it is most commonly used, with XPath, the tree-walking and expression language used with XQuery and XSLT.
LMNL in Miniature
Wendell Piez (Amsterdam Overlap Workshop, December 2008)
An Introduction to Schematron
Wendell Piez and Debbie Lapeyre (Philadelphia XML Users Group, November 2008)
A short version of Mulberry’s popular Introduction to Schematron, a small, powerful, easy to learn fact-checker for XML documents. Schematron can provide the best error/reporting messages in the world (you craft them for your specific situation) and can be really useful in editing and checking XML. Whether you are using XSD, DTD, or RELAX NG, there are some validations that those grammar-based schema languages just can’t express or that, for practical or business reasons, you do not want to build into your basic XML models. [Link to the presentation]
Cool or Useful
B. Tommie Usdin (Balisage 2008, Montréal)
True versus Useful, or True versus Likely-to-be-useful, are trade-offs we find ourselves making in document modeling and many other markup-related situations all the time. But Cool versus Useful is a far more difficult trade-off, especially since our world now includes a number of very cool techniques, tools, and specifications. Cool toys can have a lot of gravitational pull attracting attention, users, projects, and funding. Unfortunately, there is sometimes a disconnect between the appeal of a particular tool/technology and its applicability in a particular circumstance.
A Non-backwards-compatible Update: A Difficult Decision
Deborah A. Lapeyre (International Symposium on Versioning XML Documents and Vocabularies, August 2008, Montréal)
The U.S. National Library of Medicine (NLM) Journal/Book Tag Sets have been widely adopted by libraries, archives, and commercial publishers. The users are widely distributed, generally unknown to each other, and in many cases unknown to the Tag Set advisory group, owners, and secretariat. The first five revisions to the Tag Sets were backwards compatible, but the most recent is not. The decision to make a non-backwards-compatible revision was not taken lightly. It was made based on several factors, including a decision to favor the needs of future users over the convenience of current users.
Introductory Schematron
Deborah A. Lapeyre and Wendell A. Piez (DC XML Users Group, January 2008, Washington, D.C.)
This tutorial discusses Schematron, a rules-based validation/reporting language that works by making assertions about patterns found in XML documents and reporting back messages about the truth (or otherwise) of those assertions. While Schematron can work with many tree-querying languages, the tutorial illustrates Schematron as it is most commonly used, with XPath, the tree-walking and expression language used with XQuery and XSLT. [Link to our Schematron page, which includes this tutorial’s slides]
2007
Interview with B. Tommie Usdin, President, Mulberry Technologies
(reprinted from Silverchair’s newsletter, Context Matters, December 2007)
Silverchair interviews Tommie Usdin about her experience with markup languages, the work of Mulberry Technologies, and the use of XML in publishing.
Separating Mapping from Coding in Transformation Tasks
Tommie Usdin and Wendell Piez (XML 2007, Boston)
Creating XML transformations in two separate tasks, Mapping and Coding, not only maximizes the skills of various team members, but also reduces development time and cost, and increases correctness of the finished code. [Link to the presentation slides or our example mapping specification]
Introductory Schematron
Deborah A. Lapeyre and Wendell A. Piez (XML 2007, Boston)
This tutorial discusses Schematron, a rules-based validation/reporting language that works by making assertions about patterns found in XML documents and reporting back messages about the truth (or otherwise) of those assertions. While Schematron can work with many tree-querying languages, the tutorial illustrates Schematron as it is most commonly used, with XPath, the tree-walking and expression language used with XQuery and XSLT. [Link to our Schematron page, which includes this tutorial’s slides]
TEI at 20: Congratulations! The Next 20 Will Tell the Tale
B. Tommie Usdin (Text Encoding Initiative Consortium Members’ Meeting, November 2007, College Park, MD)
At the Text Encoding Initiative Consortium Members’ Meeting (University of Maryland, College Park), B. Tommie Usdin delivers a Keynote presentation discussing the TEI’s accomplishments and influence on the computing world over the last 20 years and posing questions, the answers to which will define the TEI’s goals for the future. [Link to the text of the Keynote]
Riding the Wave, Riding for a Fall, or Just Along for the Ride?
B. Tommie Usdin (Extreme Markup Languages® 2007, Montréal)
Tommie Usdin discusses the implications of XML’s success: whether the work is over (or just starting), whether XML is (or should be) going underground, and whether the markup community has misconceptions about its role in XML’s success.
LMNL (Layered Markup and Annotation Language)
Wendell Piez (International Workshop on Markup of Overlapping Structures, August 2007, Montréal)
As part of a panel discussion, Wendell Piez explores the potential of LMNL as a way to handle overlapping markup.
Form and Format: Towards a Semiotics of Digital Text Encoding (p.153)
Wendell Piez (Digital Humanities 2007, University of Illinois, Urbana-Champaign)
2006
XSLT for Quality Checking in the Publication Workflow
Wendell Piez (Mulberry’s Seminar Series)
[Link to the seminar slides. Those wishing to download the sample stylesheets demonstrated at the seminar may find the link in the seminar’s penultimate slide (#38).]
The Layered Markup and Annotation Language (LMNL)
John Cowan, Jeni Tennison, and Wendell Piez (Extreme Markup Languages® 2006, Montréal)
A brief report on some design decisions recently made by the Ad Hoc LMNL Group about the LMNL (Layered Markup and aNnotation Language) syntax and design model. A simplified version of layers is presented, along with a review of LMNL that includes previously unpublished material on non-character atoms and namespaces. (Although this paper is not represented in the conference proceedings, an author package is available as part of the proceedings.)
What Is XML and Why Should You Care?
B. Tommie Usdin and Debbie Lapeyre (XPlor Mid Atlantic, April 2006, Miami Beach)
More and more organizations are moving their content to XML. Some are asking for XML as well as pages from their printers; some are sending XML to their printers. This presentation discusses who is moving to XML and what they hope to get from it, as well as how does XML work and how participants should approach XML. The basic vocabulary needed to talk about XML and an overview of the logical components of an XML application are provided. [Link to the HTML presentation copy]
EXPLOR Global (February 2006, Miami Beach)
Tommie Usdin and Debbie Lapeyre
Why XML for Print?
Should your organization be making print publications from XML? The current XML hype is focused on web portals, XML-service-oriented architectures, and e-business applications. But while the use of XML in traditional print publishing may be less trendy and newsworthy, it is equally powerful. Working with XML can help publishers improve quality and timeliness, as well as allowing them to repurpose, reuse, and reformat content from a single source. XML allows publishers to create high-quality print publications using source data that can also support electronic publication, electronic archives, enhanced search and retrieval, and new product opportunities. [Link to the HTML Slides for “Why XML for Print?” or PDF of Handouts for “Why XML for Print?”]
XML in Print Production
Most of the ways of adding XML to print production come down to a variation on one of three themes: making pages then XML, introducing XML during composition, and working with XML from as far in front of composition as you can manage. What are the implications and advantages of each style? Why would you prefer one to another? If you do make XML early in the production cycle, how do you get from XML to pages? There are many methods, each with its own set of pros and cons, that can be used in combination for multiple content reuse. [Link to the HTML Slides for “XML in Print Production” or PDF of Handouts for “XML in Print Production”]
What Is XML and Why Should You Care?
XML is a data format that manages text and content as named objects. XML documents with their “tags” can be part of cost-effective solutions for content reuse, repurposing, internationalization, and more. This session provides the vocabulary you need to talk about XML, a look at how XML works, some real world examples, and a glimpse at the logical components of an XML application. [Link to the HTML Slides for “What Is XML and Why Should You Care?” or PDF of Handouts for “What Is XML and Why Should You Care?”]
How and Why Are Companies Using XML?
More and more organizations are moving their content to XML. Some are asking for XML as well as pages from their printers; some are sending XML to their printers. Who is moving to XML, and what do they hope to get from it? How can designers and printers serve their XML customers? If you understand why your customer wants XML and what they want to do with it, you can help them meet their goals, and thus increase your value as a supplier! [Link to the HTML Slides for “How and Why Are Companies Using XML?” or PDF of Handouts for “How and Why Are Companies Using XML?”]
Moving to XML: The Investment
XML has many benefits, but no one ever said it came for free. Moving to XML will change the way you work, the flow of content through your organization, staffing skills (and possibly staffing levels), and the opportunities you have. Where do tags come from and when? What does your staff need to know? Does added value mean added work? How can XML help in QA? What are some of the known problems and pitfalls you might avoid? [Link to the HTML Slides for “Moving to XML: The Investment” or PDF of Handouts for “Moving to XML: The Investment”]
Introduction to XSLT Concepts
You keep hearing that XML is exciting; that once you have your content in XML you can do anything with it; that XML is powerful and flexible. Then you look at an XML file and don’t see what the fuss is all about! XSLT (the XML transformation language) is the language that makes XML powerful and flexible. Using XSLT you can: convert XML into display formats (HTML, PDF, etc.); make XML into tool-specific formats (such as typesetting languages); and automatically add numbering, cross-references, tables of contents, and generated text to your pages. You can also use XSLT to convert documents tagged according to your DTD/schema into documents tagged according to someone else’s tag set! XSLT changes the way you’ll think about XML. This introduction covers the principles of XSLT, its processing model, what it can and can’t do, and how it is being used in real environments. This is a concept course, showing “just enough” syntax. [Link to the HTML Slides for “Introduction to XSLT Concepts” or PDF of Handouts for “Introduction to XSLT Concepts”]
Introduction to XSL-FO Concepts (Printing Directly from XML)
XSL-FO (Extensible Stylesheet Language – Formatting Objects) is a specification for formatting XML documents for print or web display. Publishers, catalog producers, and financial institutions (among many others) are using XSL-FO to go directly from XML into PDF, PostScript, PCL, etc. This conceptual introduction introduces XSL-FO, what it is, how it works, how it can be used, and what it is capable of producing (and what it can’t!). Using a stylesheet-in-development, we illustrate the logical components of an XSL-FO formatting system, how the page geometry works, and show you the basic vocabulary of “formatting objects”: blocks, wrappers, Cascading-Stylesheet-like attributes, and pages. Why isn’t everyone using XSL-FO? Should your company consider it? [Link to the HTML Slides for “Introduction to XSL-FO Concepts” or PDF of Handouts for “Introduction to XSL-FO Concepts”]
Introduction to XPath 2.0
Wendell Piez and Debbie Lapeyre (DC XML Users Group, January 2006, Washington, D.C.)
An introduction to the concepts and syntax of XPath 2.0 (the new XML query and tree-traversal language — now in Working Draft — from W3C) and the differences between XPath 1.0 and XPath 2.0. The data model has changed; there are powerful new functions and operators; and XPath 2.0 is closer to a programming language than ever. [Link to the HTML presentation copy]
2005
XSLT Throughout the Document Lifecycle
Wendell Piez (XML 2005, Atlanta)
XSLT can be applied to a range of tasks besides generating final output formats, including the automation and semi-automation of editorial and copy-editing chores, extra-schema validation, data aggregation, filtering, indexing, file management, and more.
W3C XML Schema, RELAX NG, Schematron, or DTD: How’s a User To Choose?
B. Tommie Usdin (XML 2005, Atlanta)
XML DTDs and schemas are used to specify what tagging is allowed in a set of XML documents. Originally, XML had only one way to express these rules; now there are many, each of which reflect not only different conceptions of the functional requirements for constraint languages, but also different approaches to meeting those requirements. This talk provides a clear look at the nature and strengths of each of the major schema languages (XML DTD, W3C XML Schema (XSD), RELAX NG, and Schematron), without hype and without advocating any of them. After discussing the uses of XML schemas in general, each language is examined, highlighting its major features; what sorts of constraints (rules) it can, and cannot express; and the environments in which it is most popular. The talk ends with factors in selecting appropriate schema language(s) and a discussion of ways in which many organizations are using multiple schema languages in the same projects to do different tasks. [HTML Slides]
Introduction to XPath 2.0
Wendell Piez (XML 2005, Atlanta)
A tutorial introducing the proposed new W3C XML document query and traversal
language. With this well-received tutorial, we provide some sample files
for participants to play with. [Downloadable samples (link to a 9Kb .zip
file)]
In Praise of the Edge Case
B. Tommie Usdin (Extreme Markup Languages® 2005, Montréal)
The Extreme Markup Languages® conference, the organizers are sometimes told, devotes too much time to edge cases. This complaint inspires reflection on the value of exploring, learning about, and learning from the technological edge. Remember: today’s main stream application was yesterday’s edge case.
Format and Content: Can They Be Separated? Should They Be?
Wendell Piez (Extreme Markup Languages® 2005, Montréal)
An examination of a practical and theoretical question in markup language design, using as a counter-example the unorthodox “Web Graphic Layout Language” project of the author.
2004
Way Beyond Powerpoint
Wendell Piez (XML 2004, Washington, D.C.)
Microsoft PowerPoint is ubiquitous, and therefore controversial. Most critiques, both of the software and of its widespread adoption in educational settings, express concerns that are not particular to PowerPoint alone, but apply to “slideware” presentations generally. The reliance on sequences and hierarchies of bullet points (a poor means of presenting some kinds of complex information), the foregrounding of visual gimmicks over content, the displacement of attention from the speaker and her message onto summary arguments presented dumbly on screen: far from being necessary features of presentation technology, these (according to the critics) prove to be shortcomings that interfere with, rather than enhance, a presenter’s ability to communicate.
This paper presents an alternative to slideware, in the form of SVG graphics used for presentation. Why SVG? It meets all our functional requirements of a presentations technology, but even more importantly, as an XML-based format, Scalable Vector Graphics is well-suited to an XML-based production framework. Going far beyond sequences of bullet points, SVG supports open-ended, innovative uses of visual media in presentation. This becomes practical because the complexities of SVG coding can be relegated to a processing layer, following the classic design pattern of XML publishing. [This paper won the Best Speakers Award at XML 2004. An early version was presented at ALLC/ACH 2004 (Gothenburg, Sweden).]
Half-steps toward LMNL
Wendell Piez (Extreme Markup Languages® 2004, Montréal)
Overlap in markup occurs where some markup structures do not nest, such as where the sentence and phrase boundaries of a poem and the metrical line structure describe different hierarchies. LMNL (Layered Markup and Annotation Language) is a model for representing textual data, designed to recognize and account for layer separation and markup overlap. LMNL is specified as a data model, not as a syntax — but without a syntax and an API, it’s very difficult to experiment with the model. The author demonstrates a subset of LMNL using an XML syntax and some severe restrictions on LMNL (thus “half-LMNL”).
Authoring Scholarly Articles: TEI or Not TEI?
Wendell Piez (ALLC/ACH 2004, Gothenburg, Sweden)
The TEI has grown and matured greatly in recent years, both in the number and breadth of its applications, and in their sophistication. It can be taken as a sign of the success and state of health of TEI to see persistent efforts to push its boundaries. The author discusses one area that is repeatedly cited as where the TEI “should” provide a competitive alternative, but apparently does not: the realm of authoring or original composition by scholars and writers.
2003 – 2002 – 2001
NLM’s Public-domain DTDs: A 9-Month Update
Debbie Lapeyre and Jeff Beck (XML 2003, Philadelphia)
In March 2003, the National Library of Medicine (NLM) released into the public domain a suite of DTD modules for describing journal literature, books, and many kinds of textual material. The full suite was developed by the National Center for Biotechnology Information (NCBI) and the XML consulting firms Inera, Inc. (funded by the Andrew W. Mellon Foundation) and Mulberry Technologies, Inc. (funded by NCBI). Also in March the first two public DTDs developed from this suite were released: the Journal Archiving and Interchange DTD, and the Journal Publishing DTD which defines a common format for the creation of journal content in XML. This presentation discusses use of the DTDs so far, future plans, and the work of the advisory board.
XSL-FO Chefs’ Tools Exhibition
Tommie Usdin, maestro (XML 2003, Philadelphia)
In this technical exhibition of XSL-FO tools, each product representative provided a sample and rendered the samples provided by the other participants. As far as we know, this was the first public demonstration of interchange of typesetting files. The participants received only XSL-FO instances, without any guidance on what the formatted document should look like, and each formatted as many of the samples as they could. None of the tools succeeded with all of the samples, and some of them required manipulation of the documents before they could be rendered at all. At the end of a very exciting demonstration of XSL-FO rendering tools, the conclusions were: XSL-FO rendering is practical for many applications; there are a variety of high-quality XSL-FO tools available; each tool has strengths and weaknesses; and none is clearly superior to all others for all uses.
XSLT for Quality Checking in a Publication Workflow
Wendell Piez (XML 2003, Philadelphia)
Editorial work will always require the judgement of informed and sensitive human beings. Nonetheless, XML-based applications, even at a small scale, can support and complement, rather than detract from, the work of human beings in providing the kind of care and attention to information through the publishing process that is, ultimately, the only thing that can assure the quality of published works. This paper examines, in concrete detail (using the XML behind an XML 2003 conference paper as an example test bed), how one particular XML technology, XSLT, can be brought to bear in such applications.
When “It doesn’t matter” Means “It matters!”
B. Tommie Usdin (Extreme Markup Languages 2002, Montréal)
Few classes of narrative document can be as tightly specified as most business documents can. But many can usefully be specified more tightly than they are. This talk illustrates the costs of underspecifying content models. It is important to recognize the difference between “it doesn’t matter; there is no information here” and “it can’t be specified because the content creator will supply it”.
Human and Machine Sign Systems
Wendell Piez (Extreme Markup Languages 2002, Montréal)
A schema’s role is to mediate and adjudicate between human and machine semantics; recognizing this can help us manage our schemas better. Some practitioners work solely with an operational semantics, according to which the meaning of a tag is what we want it to cause the processing software to do with the data. A better understanding is reached if we adopt the structuralist view that a sign is the (arbitrary) relation between a signifier and a signified. In metalanguages (including schema languages) the signified is itself a sign; in some languages the signifier may likewise be a sign. Proper understanding of the relationship among sign, signifier, signified, metalanguage, and connotative system will allow us to layer our systems more effectively and to obtain useful results even in fluid systems where our understanding of the underlying reality cannot, or should not, be fixed.
The Layered Markup and Annotation Language (LMNL)
Jeni Tennison and Wendell Piez (Extreme Markup Languages 2002, Montréal)
Representing multiple hierarchies within a single document has always been a problem for XML. To try to address the problems of representing multiple hierarchies and of annotating existing tree structures with type information (as in the PSVI), we have developed a layered data model based on the Core Range Algebra presented at Extreme 2002 by Gavin Nicol. This data model views documents as strings over which span a number of named ranges, each of which can themselves have associated metaranges with their own internal structure. To aid experimentation with this data model, we developed a markup notation to reflect it, the Layered Markup and Annotation Language (LMNL), and have constructed several prototype applications to facilitate the extraction of single views, as XML structures, from LMNL documents. (Although this paper is not represented in the conference proceedings, an author package is available as part of the proceedings.)
XML and Print
Debbie Lapeyre (Seybold New York, 2002; other locations previous years)
This tutorial explores the relevance of XML as a data format for creating high-quality print publications that can later support electronic publication, electronic archives, and enhanced search and retrieval. XML’s ability to assist management of multi-author publications, revisions, and approvals; and its potential for fast reuse and repurposing of content are highlighted. [.zip of 2001 version, in PDF]
XML for Publishing Managers
Debbie Lapeyre (Seybold New York, 2002; other locations previous years)
A 3-hour tutorial that starts by defining XML and goes on to explain the benefits of XML applications, the use of XML in multimedia publishing, application integration, information repositories, and database publication. The impact of XML on workflow and staffing is also discussed, as well as the staff skills needed for XML-based data distribution. [.zip of August 2001 version, in PDF]
Document Analysis for DTD or Schema Development
Debbie Lapeyre and Tonya Gaylord (XML 2001, Orlando)
A tutorial on the principles of information analysis. An interactive sample document analysis is used to demonstrate basic concepts of structured markup, the distinction between “useful” versus “possible” information, and the relationships between information components. [Handout in PDF]
From HTML to XML
Wendell Piez (XML 2001, Orlando)
Migrating data from a web format (HTML) into a more versatile and manageable XML format involves a range of decisions based on what shape the source code is in, what kinds of functions and operations the new XML-encoded data needs to be able to support, and design trade-offs between the power and versatility of markup on the one hand, and the expense of tagging and maintenance of strong data on the other. [HTML slides]
Beyond the “Descriptive vs. Procedural” Distinction
Wendell Piez (Extreme Markup Languages 2001, Montréal)
A paper considering markup design strategies from a theoretical point of view. Sometimes “semantic opacity” is a feature, not a bug. Because they sometimes work to mask even while they communicate, markup languages can be usefully considered as a species of rhetoric.
Previous years
A Manager’s Introduction to XML
Wendell Piez (XML 2000, Washington, D.C.)
A tutorial providing a non-technical introduction to XML, including its historical origins and its business application. Also discussed are the XML “family” of standards, i.e., those standards (XSL, XSLT, XSLFO) related to XML. [.zip of PDF]
XSL: Characteristics, Status and Potentials for the
Humanities
Wendell Piez (2000 Joint Conference of the Association for Computing in
the Humanities and the Association for Literary and Linguistic
Computing, Glasgow)
A conference paper providing an overview of XSL with reference to applications in Humanities disciplines, particularly as concerns digital text encoding projects (such as digital libraries) and Humanities-oriented analytical text processing. [Downloadable .zip of HTML]
Practical Guide to SGML/XML Filters
Introduction
Debbie Lapeyre
Introduction for Norman E. Smith’s book on SGML/XML Filters (Plano, TX: Wordware Publishing, Inc., 1997 [1st edition], 1998 [2d edition]), noting the value of SGML when combined with translation programs for output across various media, e.g., print, voice synthesis. As a prelude to the book’s discussion of several languages for SGML manipulation, the importance of such filters in the authoring context to enable creation of SGML from diverse sources, such as desktop publishing tools or spreadsheets, is likewise highlighted.
XML for SGMLers
Tommie Usdin and Debbie Lapeyre (XML’98, Chicago)
A one-day tutorial on the details of XML syntax and the differences between XML and SGML, highlighting the features, functionality, and “funkiness” XML excludes. Hands-on instruction includes emphasis on the changes necessary to convert SGML documents into well-formed XML, the conversion of SGML DTDs, and the trade-offs in various conversion approaches. [Downloadable .zip of PDF]
XML: Not a Silver Bullet, but a Great Pipe Wrench
Tommie Usdin and Tony Graham (ACM StandardView 6(3):125-132, 1998)
An article discussing the potential uses and benefits of XML, while questioning whether the excitement surrounding it has been fully merited.
Washington Technologies White Papers
Several early statements (1997) on the business case for SGML/XML. [Downloadable .zip of HTML]