678 research outputs found

    GATE -- an Environment to Support Research and Development in Natural Language Engineering

    Get PDF
    We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

    Ellogon: A New Text Engineering Platform

    Full text link
    This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.Comment: 7 pages, 9 figures. Will be presented to the Third International Conference on Language Resources and Evaluation - LREC 200

    Software Infrastructure for Natural Language Processing

    Full text link
    We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

    Towards structured, block-based PDF

    Get PDF
    The Portable Document Format (PDF), defined by Adobe Systems Inc. as the basis of its Acrobat product range, is discussed in some detail. Particular emphasis is given to its flexible object-oriented structure, which has yet to be fully exploited. It is currently used to represent not logical structure but simply a series of pages and associated resources. A definition of an Encapsulated PDF (EPDF) is presented, in which EPDF blocks carry with them their own resource requirements, together with geometrical and logical information. A block formatter called Juggler is described which can lay out EPDF blocks from various sources onto new pages. Future revisions of PDF supporting uniquely-named EPDF blocks tagged with semantic information would assist in composite-pagemakeup and could even lead to fully revisable PDF

    EAD - enabling armchair delivery : approaches to encoding finding aids at the University of Liverpool

    Get PDF
    EAD is increasingly being selected as the primary data format for constructing archival finding aids in the British Archive Community as the new technologies and know-how required to encode lists are being embraced in many repositories. One major problem facing archivists, though, is how to convert finding aids held in a variety of formats (including databases, word processed documents and paper lists with no machine readable form) into EAD. This article will discuss the methods used in Special Collections and Archives at the University of Liverpool Library in converting finding aids into EAD. Two main examples will be discussed: firstly, designing database output styles which automatically generate EAD tags to wrap around database fields using the ProCite bibliographic database and secondly, offshore keying of paper lists with the addition of basic EAD tags following a rigorous template designed by Special Collections and Archives staff. Both methods have proved effective and have facilitated the generation of EAD encoded lists for a number of our largest collections. Finally, there will be a brief discussion of our use of native EAD generation using AdeptEdit software and our continuing use of conversion methods

    Mapping and Displaying Structural Transformations between XML and PDF

    Get PDF
    Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving. Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'. This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window. Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document

    Just-in-time hypermedia

    Get PDF
    Many analytical applications, especially legacy systems, create documents and display screens in response to user queries dynamically or in real time . These documents and displays do not exist in advance, and thus hypermedia must be generated \u27just in time -automatically and dynamically. This dissertation details the idea of \u27just-in-time hypermedia and discusses challenges encountered in this research area. A fully detailed literature review about the research issues and related research work is given. A framework for the \u27just-in-time hypermedia compares virtual documents with static documents, as well as dynamic with static hypermedia functionality. Conceptual \u27just-in-time hypermedia architecture is proposed in terms of requirements and logical components. The \u27just-in-time hypermedia engine is described in terms of architecture, functional components, information flow, and implementation details. Then test results are described and evaluated. Lastly, contributions, limitations, and future work are discussed

    Experience with the use of Acrobat in the CAJUN publishing project

    Get PDF
    Adobe's Acrobat software, released in June 1993, is based around a new Portable Document Format (PDF) which offers the possibility of being able to view and exchange electronic documents, independent of the originating software, across a wide variety of supported hardware platforms (PC, Macintosh, Sun UNIX etc.). The fact that Acrobat's imageable objects are rendered with full use of Level 2 PostScript means that the most demanding requirements can be met in terms of high-quality typography and device-independent colour. These qualities will be very desirable components in future multimedia and hypermedia systems. The current capabilities of Acrobat and PDF are described; in particular the presence of hypertext links, bookmarks, and yellow sticker annotations (in release 1.0) together with article threads and multi-media plugins in version 2.0, This article also describes the CAJUN project (CD-ROM Acrobat Journals Using Networks) which has been investigating the automated placement of PDF hypertextual features from various front-end text processing systems. CAJUN has also been experimenting with the dissemination of PDF over e-mail, via World Wide Web and on CDROM
    corecore