We examine the issue of digital formats for document encoding, archiving and
publishing, through the specific example of "born-digital" scholarly journal
articles. We will begin by looking at the traditional workflow of journal
editing and publication, and how these practices have made the transition into
the online domain. We will examine the range of different file formats in which
electronic articles are currently stored and published. We will argue strongly
that, despite the prevalence of binary and proprietary formats such as PDF and
MS Word, XML is a far superior encoding choice for journal articles. Next, we
look at the range of XML document structures (DTDs, Schemas) which are in
common use for encoding journal articles, and consider some of their strengths
and weaknesses. We will suggest that, despite the existence of specialized
schemas intended specifically for journal articles (such as NLM), and more
broadly-used publication-oriented schemas such as DocBook, there are strong
arguments in favour of developing a subset or customization of the Text
Encoding Initiative (TEI) schema for the purpose of journal-article encoding;
TEI is already in use in a number of journal publication projects, and the
scale and precision of the TEI tagset makes it particularly appropriate for
encoding scholarly articles. We will outline the document structure of a
TEI-encoded journal article, and look in detail at suggested markup patterns
for specific features of journal articles