979 research outputs found
Towards structured, block-based PDF
The Portable Document Format (PDF), defined by Adobe Systems Inc. as the basis of its Acrobat product range, is discussed in some detail. Particular emphasis is given to its flexible object-oriented structure, which has yet to be fully exploited. It is currently used to represent not logical structure but simply a series of pages and associated resources. A definition of an Encapsulated PDF (EPDF) is presented, in which EPDF blocks carry with them their own resource requirements, together with geometrical and logical information. A block formatter called Juggler is described which can lay out EPDF blocks from various sources onto new pages. Future revisions of PDF supporting uniquely-named EPDF blocks tagged with semantic information would assist in composite-pagemakeup and could even lead to fully revisable PDF
Creating Structured PDF Files Using XML Templates
This paper describes a tool for recombining the logical structure from an XML document with the typeset appearance of the corresponding PDF document. The tool uses the XML representation as a template for the insertion of the logical structure into the existing PDF document, thereby creating a Structured/Tagged PDF. The addition of logical structure adds value to the PDF in three ways: the accessibility is improved (PDF screen readers for visually impaired users perform better), media options are enhanced (the ability to reflow PDF documents, using structure as a guide, makes PDF viable for use on hand-held devices) and the re-usability of the PDF documents benefits greatly from the presence of an XML-like structure tree to guide the process of text retrieval in reading order (e.g. when interfacing to XML applications and databases)
Mapping and Displaying Structural Transformations between XML and PDF
Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving.
Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'.
This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window.
Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document
Adobe's Acrobat -- the Electronic Journal Catalyst?
Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.).
The principal features of Acrobat are reviewed and its importance for libraries discussed in the context of experience already gained from the CAJUN project (CD-ROM Acrobat Journals Using Networks). This two-year project, funded by two well-known journal publishers, is investigating the use of Acrobat software for the electronic dissemination of journals, on CD-ROM and over networks
`Electronic Publishing' -- Practice and Experience
Electronic Publishing -- Origination, Dissemination and Design (EP-odd) is an academic journal which publishes refereed papers in the subject area of electronic publishing. The authors of the present paper are, respectively, editor-in-chief, system software consultant and senior
production manager for the journal. EP-odd's policy is that editors, authors, referees and production staff will work closely together using electronic mail. Authors are also encouraged to originate their
papers using one of the approved text-processing packages together with the appropriate set of macros which enforce the layout style for the journal. This same software will then be used by the
publisher in the production phase. Our experiences with these strategies are presented, and two recently developed suites of software are described: one of these makes the macro sets available over
electronic mail and the other automates the flow of papers through the refereeing process. The decision to produce EP-odd in this way means that the publisher has to adopt production procedures
which differ markedly from those employed for a conventional journal
In-house Preparation of Examination Papers using troff, tbl, and eqn.
Starting in December 1982 the University of Nottingham decided to phototypeset almost all of its examination papers `in house' using the troff, tbl and eqn programs running under UNIX. This tutorial lecture highlights the features of the three programs with particular reference to their strengths and weaknesses in a production environment. The following issues are particularly addressed:
Standards -- all three software packages require the embedding of commands and the invocation of pre-written macros, rather than `what you see is what you get'. This can help to enforce standards, in the absence of traditional compositor skills.
Hardware and Software -- the requirements are analysed for an inexpensive preview facility and a low-level interface to the phototypesetter.
Mathematical and Technical papers -- the fine-tuning of eqn to impose a standard house style.
Staff skills and training -- systems of this kind do not require the operators to have had previous experience of phototypesetting. Of much greater importance is willingness and flexibility in learning how to use computer systems
Electronic Publishing : the evolution and economics of a hybrid journal.
The technical, social and economic issues of electronic publishing are examined by using as a case study the evolution of the journal Electronic Publishing Origination, Dissemination and Design (EP-odd) which is published by John Wiley Ltd. The journal is a `hybrid' one, in the sense that it appears in both electronic and paper form, and is now in its ninth year of publication. The author of this paper is the journal's Editor-in- Chief. The first eight volumes of EP-odd have been distributed via the conventional subscription method but a new method, from volume 9 onwards, is now under discussion whereby accepted papers will first be published on the EP-odd web site, with the printed version appearing later as a once-per-volume operation. Later sections of the paper lead on from the particular experiences with EP-odd into a more general discussion of peer review and the acceptability of e-journals in universities, the changing role of libraries, the sustainability of traditional subscription pricing and the prospects for `per paper' sales as micro-payment technologies become available
Separable Hyperstructure and Delayed Link Binding
As the amount of material on the World Wide Web continues to grow, users are discovering that the Web's embedded, hard-coded, links are difficult to maintain and update. Hyperlinks need a degree of abstraction in the way they are specified together with a sound underlying document structure and the property of separability from the documents they are linking. The case is made by studying the advantages of program/data separation in computer system architectures and also by re-examining some selected hypermedia systems that have already implemented separability. The prospects for introducing more abstract links into future versions of HTML and PDF, via emerging standards such as XPath, XPointer XLink and URN, are briefly discussed
Experience with the use of Acrobat in the CAJUN publishing project
Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that Acrobat's imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography and device-independent colour. These qualities will be very desirable components in future multimedia and hypermedia systems. The current capabilities of Acrobat and PDF are described; in particular the presence of hypertext links, bookmarks, and yellow sticker annotations (in release 1.0) together with article threads and multi-media plugins in version 2.0, This article also describes the CAJUN project (CD-ROM Acrobat Journals Using Networks) which has been investigating the automated placement of PDF hypertextual features from various front-end text processing systems. CAJUN has also been experimenting with the dissemination of PDF over e-mail, via World Wide Web and on CDROM
Automated conversion of Web-based marriage register data into a printed format with predefined layout
The Phillimore Marriage Registers for England were published in the period 1896 to 1922 and have defined a standard layout format for the typesetting of marriage data. However, not all English parish churches had their marriage registers analysed and printed by the Phillimore organisation within this time period.
This paper tells the story of Wirksworth, a town in Derbyshire with a large church, licensed for marriages, yet whose marriage data was not released to the Phillimore organisation. Hence there is no printed Phillimore Marriages volume for Wirksworth. However, in recent years, a Wirksworth web site, created by John Palmer, has become famous as being probably the most comprehensive record of a parishâs activities anywhere on the Web.
Within a total of 120 MB of data on the web site, covering events in Wirksworth from medieval times to the present, is a set of data recording births, marriages and deaths transcribed from the original hand-written church register volumes.
The work described here covers the software tools and techniques that were used in creating a set of awk scripts to extract all the marriage records from the Wirksworth web site data. The extracted material was then automatically re-processed, typeset and indexed to form an entirely new Phillimore-style volume for Wirksworth marriages
- âŠ