Search CORE

678 research outputs found

GATE -- an Environment to Support Research and Development in Natural Language Engineering

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Rodgers Peter
Wilks Yorick
Publication venue: IEEE Computer Society
Publication date: 01/01/1996
Field of study

We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

CiteSeerX

Kent Academic Repository

Ellogon: A New Text Engineering Platform

Author: Androutsopoulos Ion
Karkaletsis Vangelis
Paliouras Georgios
Petasis Georgios
Spyropoulos Constantine D.
Publication venue
Publication date: 01/01/2002
Field of study

This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.Comment: 7 pages, 9 figures. Will be presented to the Third International Conference on Language Resources and Evaluation - LREC 200

arXiv.org e-Print Archive

CiteSeerX

Software Infrastructure for Natural Language Processing

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Wilks Yorick
Publication venue
Publication date: 01/01/1997
Field of study

We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

arXiv.org e-Print Archive

CiteSeerX

Towards structured, block-based PDF

Author: Brailsford David F.
Smith Philip N.
Publication venue: John Wiley Ltd
Publication date: 01/01/1995
Field of study

The Portable Document Format (PDF), defined by Adobe Systems Inc. as the basis of its Acrobat product range, is discussed in some detail. Particular emphasis is given to its flexible object-oriented structure, which has yet to be fully exploited. It is currently used to represent not logical structure but simply a series of pages and associated resources. A definition of an Encapsulated PDF (EPDF) is presented, in which EPDF blocks carry with them their own resource requirements, together with geometrical and logical information. A block formatter called Juggler is described which can lay out EPDF blocks from various sources onto new pages. Future revisions of PDF supporting uniquely-named EPDF blocks tagged with semantic information would assist in composite-pagemakeup and could even lead to fully revisable PDF

Nottingham ePrints

Nottingham eTheses

Repository@Nottingham

EAD - enabling armchair delivery : approaches to encoding finding aids at the University of Liverpool

Author: Allinson Julie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/1998
Field of study

EAD is increasingly being selected as the primary data format for constructing archival finding aids in the British Archive Community as the new technologies and know-how required to encode lists are being embraced in many repositories. One major problem facing archivists, though, is how to convert finding aids held in a variety of formats (including databases, word processed documents and paper lists with no machine readable form) into EAD. This article will discuss the methods used in Special Collections and Archives at the University of Liverpool Library in converting finding aids into EAD. Two main examples will be discussed: firstly, designing database output styles which automatically generate EAD tags to wrap around database fields using the ProCite bibliographic database and secondly, offshore keying of paper lists with the addition of basic EAD tags following a rigorous template designed by Special Collections and Archives staff. Both methods have proved effective and have facilitated the generation of EAD encoded lists for a number of our largest collections. Finally, there will be a brief discussion of our use of native EAD generation using AdeptEdit software and our continuing use of conversion methods

White Rose Research Online

Mapping and Displaying Structural Transformations between XML and PDF

Author: Brailsford David F.
Hardy Matthew R. B.
Publication venue: ACM Press
Publication date: 01/01/2002
Field of study

Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving. Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'. This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window. Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document

Just-in-time hypermedia

Author: Zhang Li
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2005
Field of study

Many analytical applications, especially legacy systems, create documents and display screens in response to user queries dynamically or in real time . These documents and displays do not exist in advance, and thus hypermedia must be generated \u27just in time -automatically and dynamically. This dissertation details the idea of \u27just-in-time hypermedia and discusses challenges encountered in this research area. A fully detailed literature review about the research issues and related research work is given. A framework for the \u27just-in-time hypermedia compares virtual documents with static documents, as well as dynamic with static hypermedia functionality. Conceptual \u27just-in-time hypermedia architecture is proposed in terms of requirements and logical components. The \u27just-in-time hypermedia engine is described in terms of architecture, functional components, information flow, and implementation details. Then test results are described and evaluated. Lastly, contributions, limitations, and future work are discussed

Digital Commons @ New Jersey Institute of Technology (NJIT)

Experience with the use of Acrobat in the CAJUN publishing project

Author: Brailsford David F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1994
Field of study

Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that Acrobat's imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography and device-independent colour. These qualities will be very desirable components in future multimedia and hypermedia systems. The current capabilities of Acrobat and PDF are described; in particular the presence of hypertext links, bookmarks, and yellow sticker annotations (in release 1.0) together with article threads and multi-media plugins in version 2.0, This article also describes the CAJUN project (CD-ROM Acrobat Journals Using Networks) which has been investigating the automated placement of PDF hypertextual features from various front-end text processing systems. CAJUN has also been experimenting with the dissemination of PDF over e-mail, via World Wide Web and on CDROM

Nottingham eTheses

Image management applied to the medical field

Author: Dhont Eric
Nysten Pascal
Publication venue
Publication date: 01/01/2000
Field of study

Repository of the University of Namur