Search CORE

67,100 research outputs found

Recommended from our members

A lightweight, pattern-based approach to identification and formalisation of TimeML expressions in clinical narratives

Author: Gooch P.
Publication venue
Publication date: 01/01/2012
Field of study

General Architecture for Text Engineering (GATE) components for identifying clinical events and temporal expressions are developed and evaluated against a corpus of 120 discharge summaries

City Research Online

GATE -- an Environment to Support Research and Development in Natural Language Engineering

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Rodgers Peter
Wilks Yorick
Publication venue: IEEE Computer Society
Publication date: 01/01/1996
Field of study

We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

CiteSeerX

Kent Academic Repository

Software Infrastructure for Natural Language Processing

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Wilks Yorick
Publication venue
Publication date: 01/01/1997
Field of study

We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

arXiv.org e-Print Archive

CiteSeerX

Ontology Population for Open-Source Intelligence

Author: Ganino G.
Lembo D.
Mecella M.
Scafoglieri F.
Publication venue: CEUR-WS
Publication date: 01/01/2018
Field of study

We present an approach based on GATE (General Architecture for Text Engineering) for the automatic population of ontologies from text documents. We describe some experimental results, which are encouraging in terms of extracted correct instances of the ontology. We then focus on a phase of our pipeline and discuss a variant thereof, which aims at reducing the manual effort needed to generate pre-defined dictionaries used in document annotation. Our additional experiments show promising results also in this case

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

BADREX: In situ expansion and coreference of biomedical abbreviations using dynamic regular expressions

Author: Gooch P.
Publication venue: City University London
Publication date
Field of study

BADREX uses dynamically generated regular expressions to annotate term definition–term abbreviation pairs, and corefers unpaired acronyms and abbreviations back to their initial definition in the text. Against the Medstract corpus BADREX achieves precision and recall of 98% and 97%, and against a much larger corpus, 90% and 85%, respectively. BADREX yields improved performance over previous approaches, requires no training data and allows runtime customisation of its input parameters. BADREX is freely available from https://github.com/philgooch/BADREX-Biomedical-Abbreviation- Expander as a plugin for the General Architecture for Text Engineering (GATE) framework and is licensed under the GPLv3

City Research Online

New Methods, Current Trends and Software Infrastructure for NLP

Author: Cunningham Hamish
Gaizauskas Robert J.
Wilks Yorick
Publication venue
Publication date: 01/01/1996
Field of study

The increasing use of `new methods' in NLP, which the NeMLaP conference series exemplifies, occurs in the context of a wider shift in the nature and concerns of the discipline. This paper begins with a short review of this context and significant trends in the field. The review motivates and leads to a set of requirements for support software of general utility for NLP research and development workers. A freely-available system designed to meet these requirements is described (called GATE - a General Architecture for Text Engineering). Information Extraction (IE), in the sense defined by the Message Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed. \cite{Jac92}). An IE system based on GATE is also available for research purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included

arXiv.org e-Print Archive

CiteSeerX

Tagging and Morphological Processing in the SVENSK System

Author: Olsson Fredrik
Publication venue: Swedish Institute of Computer Science
Publication date: 01/03/1998
Field of study

This thesis describes the work of providing separate morphological processing and part-of-speech tagging modules in the SVENSK system by integrating the Uppsala Chart Processor (UCP) and a Brill tagger into the system. SVENSK employs GATE (General Architecture for Text Engineering) as the platform in which the components are to be integrated. Two pre-processing modules, a tokeniser and a sentence splitter for Swedish, were developed in order to facilitate the preparation of the texts to be analysed by UCP and the Brill tagger. These four components were then integrated in GATE together with a newly developed viewer for displaying the results produced by UCP. The thesis introduces the reader to the SVENSK project, the GATE system and its underlying parts, especially the database architecture which is based on the TIPSTER annotation model. Further, the issues in connection with the development and design of the tokeniser and the sentence splitter for Swedish are elaborated on. The mechanisms behind transformation-based error-driven learning methods as employed by the Brill tagger are introduced as well as the principles of chart processing in general and UCP in particular. The greater part of the thesis is devoted to the process of integrating the natural language (NL) modules in GATE using the Tcl/Tk application programmers interface (API) and a so-called loose coupling. The results of the integration of the NL modules are very encouraging: it is possible to mix modules written in programming languages from completely different paradigms (in this case the languages are Common LISP, Perl and C) and to have them interact with each other, thus maintaining a high degree of reuse of algorithmical resources. However, the use of Tcl/Tk and the associated API for processing structurally relatively complex data, i.e. the output from UCP, is time consuming and considerably slows the processing in GATE

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Using dialogue corpora to extend information extraction patterns for natural language understanding of dialogue

Author: Catizone Roberta
Dingli Alexiei
Gaizauskas Robert
Language Resources and Evaluation Conference (LREC 2010)
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

This work was funded by the Companions project (www.companions-project.org) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434.This paper examines how Natural Language Process (NLP) resources and online dialogue corpora can be used to extend coverage of Information Extraction (IE) templates in a Spoken Dialogue system. IE templates are used as part of a Natural Language Understanding module for identifying meaning in a user utterance. The use of NLP tools in Dialogue systems is a difficult task given spoken dialogue is often not well-formed and 2) there is a serious lack of dialogue data. In spite of that, we have devised a method for extending IE patterns using standard NLP tools and available dialogue corpora found on the web. In this paper, we explain our method which includes using a set of NLP modules developed using GATE (a General Architecture for Text Engineering), as well as a general purpose editing tool that we built to facilitate the IE rule creation process. Lastly, we present directions for future work in this area.peer-reviewe

OAR@UM