Mulberry Technologies, Inc. logo

Journal Archiving and Interchange Tag Suite
Questions and Answers

This document contains Tommie’s answers to questions we at Mulberry have recently been asked about the NLM Journal Archiving and Interchange Tag Suite. Many of these questions arose when publishers learned that there was a new version (version 3.0) and that it, unlike all previous versions of the Tag Suite, contained changes that were not backward-compatible with previous versions.

This is not a formal FAQ – there is one of those at: http://dtd.nlm.nih.gov/faq.html – nor is it the policy of any archive, library, or other user of the Tag Suite. This is advice my colleagues at Mulberry and I have given several of our clients.

Which Form of the Model Should We Use: the DTD, XSD, or RNG?

It depends. It depends mostly on your computing environment. If you work with tools that work best with one form of the model, then you might be best served using that model. If your staff understands one form of the model, that might be the best version for you.

The three expressions are, to the extent possible, equivalent. Documents produced according to one version of the model should be valid according to all versions.

FYI: In a recent survey of Tag Set users, we found that most current users are using the DTD version but that many are thinking about moving to one of the others “soon”.

Which Model Should We Use: Archiving, Publishing, Authoring, or Maybe Book?

It depends. It depends on which model most closely meets your needs. The Archiving Tag Set is most appropriate for users who are converting journal articles from other formats and who want to preserve as much of the “flavor” of the original tagging as possible. The Publishing Tag Set is most appropriate for users who are creating new journals or who want to regularize journal articles they are converting from other formats. The Authoring Tag Set is most appropriate for people who are creating new journal articles in XML and who want a stripped-down tag set that will enable XML editors, especially those with context-sensitive menus, to work effectively. The Book Tag Set is the model used by the NCBI Bookshelf and is an effective starting place for development of publisher-specific book models.

Which Is Better: <mixed-citation> or <element-citation>?

It depends. It depends on what you want to do with the tagged citations and on what information you have to start with.

If your starting data includes all of the punctuation you want to see in your displayed citations, and especially if you have invested editing effort to make the complex ones meet your preferred style, you would be best advised to use <mixed-citation> and retain all of the punctuation and spacing. If your starting data does not include reliable punctuation, or if you want to display the citations using a different style than they were created with, it seems to me that you are best advised to use <element-citation> and generate the display formatting and punctuation for display.

I know of a publisher who pays copy editors to perfect author-provided citations, creates print from that, pays a vendor to strip out all of the punctuation and spacing, formats citations for online display from <element-citations> automatically, pays editors to check for anomalies in this display, and hand re-punctuates the citations that look wrong online. This process seems wasteful – of time and money – to me.

If you plan on submitting your tagged XML files to an archive or providing them to a publisher or library, the recipient may have policies relating to how it prefers citations to be tagged.

Can We Use <mixed-citation> and/or <element-citation> If We Have Our Own Citation Style?

Yes! Both <mixed-citation> and <element-citation> allow their content to occur in any sequence, so you can use any of the citation structures in the sequence that your citation style requires. In addition, you can use <named-content> to tag anything else in the citations that is important in your environment but for which the Tag Sets do not provide a specific element. Either citation model can work for you if you use the NLM citation style, or APA, Chicago, MLA, Turabian, or some other citation style.

To illustrate this point, we have tagged a citation for a journal article in several citation styles (click here to see these examples).

Are All of the “rules” We Need to Follow in the Tag Library?

It depends. Many users need to interchange content with a particular partner or archive. Many libraries, content resellers, and archives (including PubMed Central) have adopted one (or more) of the NLM Tag Sets but have added additional rules and best practice guidance that is particular to their institutions and described separately from the Tag Suite documentation.

Since the Tag Sets are intended to be conversion targets, they frequently have a loose or generic model or set of values while the rules of a specific publisher, library, aggregator, or archive may be much more specific (while still completely valid to the Tag Suite structures). For example, some archives have a limited list of graphics types (expressed as mimetype and mime-subtype) they allow, and others have lists of article-type values they expect in the documents they accept.

Should We Switch to Version 3.0?

It depends. Of course, you should switch if any of the new features or structures will make working with your content easier. Among the things that will be easier with version 3.0 than with previous versions:

Similarly, if you want to send your content to a partner who prefers to receive 3.0, then it is reasonable to switch.

Should We Convert Our Backfile to Version 3.0?

It depends. Some users will convert, and some will not. If you want to use 3.0 for your new content and you have a significant backfile valid to earlier version(s) of one of the Tag Sets, you can either maintain the legacy files in the older version and create new content in version 3.0 or convert the backfile. If you want to search and manipulate the entire collection using one set of tools, you might be better off converting your legacy documents. If your legacy documents are being stored “just in case”, and you have no current plans to process them, you are probably better off leaving them in their current version.

Will It Be Difficult/Costly to Convert Our Backfile to Version 3.0?

It depends. Most documents can be transformed from earlier versions of the Tag Sets into 3.0 with a fairly straightforward conversion, which can be written in XSLT or the program of your choice. The most challenging part will be conversion of <citation>s into either <mixed-citation> or <element-citation>. Specifically, the conversion of the values of the single citation-type attribute in previous versions to the trio of publication-type, publication-format, and publisher-type is challenging in the situation in which it is not clear in the original data what was meant. For those old <citation>s where the meaning is clear, this is not difficult. For example, if the old value was <citation citation-type="journal">, this transforms easily to publication-type="journal". But for example, if the old value was <citation citation-type="government">, it may not be clear whether this should be <element-citation publisher-type="government">, <element-citation publication-type="government">, or <element-citation publisher-type="government" publication-type="government">. That is, is this a document published by the government, is it a government report, or is it both?

Will Version 2.3 (and the Previous Versions) Go Away, or Continue to Be Supported?

It depends. It depends on what you mean by “go away” and “be supported”. NLM has said that they will continue to host all versions of the Tag Sets on their server indefinitely. So in that sense, they won’t go away. On the other hand, suppliers and consumers of documents will make their own decisions on what they will create and/or accept. Mulberry’s opinion is that it is likely that quite a few publishers will stick with version 2.3 until they feel a need for some of the features in 3.0 (or later versions), which may be years from now. We expect new users to start with the latest version available. We think it is likely that conversion vendors and authoring tools will support both version 2 and version 3 for at least the next few years. (After that, the crystal ball is cloudy!)