67,100 research outputs found

    GATE -- an Environment to Support Research and Development in Natural Language Engineering

    Get PDF
    We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

    Software Infrastructure for Natural Language Processing

    Full text link
    We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

    Ontology Population for Open-Source Intelligence

    Get PDF
    We present an approach based on GATE (General Architecture for Text Engineering) for the automatic population of ontologies from text documents. We describe some experimental results, which are encouraging in terms of extracted correct instances of the ontology. We then focus on a phase of our pipeline and discuss a variant thereof, which aims at reducing the manual effort needed to generate pre-defined dictionaries used in document annotation. Our additional experiments show promising results also in this case

    New Methods, Current Trends and Software Infrastructure for NLP

    Full text link
    The increasing use of `new methods' in NLP, which the NeMLaP conference series exemplifies, occurs in the context of a wider shift in the nature and concerns of the discipline. This paper begins with a short review of this context and significant trends in the field. The review motivates and leads to a set of requirements for support software of general utility for NLP research and development workers. A freely-available system designed to meet these requirements is described (called GATE - a General Architecture for Text Engineering). Information Extraction (IE), in the sense defined by the Message Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed. \cite{Jac92}). An IE system based on GATE is also available for research purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included

    Tagging and Morphological Processing in the SVENSK System

    Get PDF
    This thesis describes the work of providing separate morphological processing and part-of-speech tagging modules in the SVENSK system by integrating the Uppsala Chart Processor (UCP) and a Brill tagger into the system. SVENSK employs GATE (General Architecture for Text Engineering) as the platform in which the components are to be integrated. Two pre-processing modules, a tokeniser and a sentence splitter for Swedish, were developed in order to facilitate the preparation of the texts to be analysed by UCP and the Brill tagger. These four components were then integrated in GATE together with a newly developed viewer for displaying the results produced by UCP. The thesis introduces the reader to the SVENSK project, the GATE system and its underlying parts, especially the database architecture which is based on the TIPSTER annotation model. Further, the issues in connection with the development and design of the tokeniser and the sentence splitter for Swedish are elaborated on. The mechanisms behind transformation-based error-driven learning methods as employed by the Brill tagger are introduced as well as the principles of chart processing in general and UCP in particular. The greater part of the thesis is devoted to the process of integrating the natural language (NL) modules in GATE using the Tcl/Tk application programmers interface (API) and a so-called loose coupling. The results of the integration of the NL modules are very encouraging: it is possible to mix modules written in programming languages from completely different paradigms (in this case the languages are Common LISP, Perl and C) and to have them interact with each other, thus maintaining a high degree of reuse of algorithmical resources. However, the use of Tcl/Tk and the associated API for processing structurally relatively complex data, i.e. the output from UCP, is time consuming and considerably slows the processing in GATE

    Using dialogue corpora to extend information extraction patterns for natural language understanding of dialogue

    Get PDF
    This work was funded by the Companions project (www.companions-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434.This paper examines how Natural Language Process (NLP) resources and online dialogue corpora can be used to extend coverage of Information Extraction (IE) templates in a Spoken Dialogue system. IE templates are used as part of a Natural Language Understanding module for identifying meaning in a user utterance. The use of NLP tools in Dialogue systems is a difficult task given spoken dialogue is often not well-formed and 2) there is a serious lack of dialogue data. In spite of that, we have devised a method for extending IE patterns using standard NLP tools and available dialogue corpora found on the web. In this paper, we explain our method which includes using a set of NLP modules developed using GATE (a General Architecture for Text Engineering), as well as a general purpose editing tool that we built to facilitate the IE rule creation process. Lastly, we present directions for future work in this area.peer-reviewe
    corecore