2,869 research outputs found
GATE -- an Environment to Support Research and Development in Natural Language Engineering
We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available
New Methods, Current Trends and Software Infrastructure for NLP
The increasing use of `new methods' in NLP, which the NeMLaP conference
series exemplifies, occurs in the context of a wider shift in the nature and
concerns of the discipline. This paper begins with a short review of this
context and significant trends in the field. The review motivates and leads to
a set of requirements for support software of general utility for NLP research
and development workers. A freely-available system designed to meet these
requirements is described (called GATE - a General Architecture for Text
Engineering). Information Extraction (IE), in the sense defined by the Message
Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which
many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed.
\cite{Jac92}). An IE system based on GATE is also available for research
purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included
Software Infrastructure for Natural Language Processing
We classify and review current approaches to software infrastructure for
research, development and delivery of NLP systems. The task is motivated by a
discussion of current trends in the field of NLP and Language Engineering. We
describe a system called GATE (a General Architecture for Text Engineering)
that provides a software infrastructure on top of which heterogeneous NLP
processing modules may be evaluated and refined individually, or may be
combined into larger application systems. GATE aims to support both researchers
and developers working on component technologies (e.g. parsing, tagging,
morphological analysis) and those working on developing end-user applications
(e.g. information extraction, text summarisation, document generation, machine
translation, and second language learning). GATE promotes reuse of component
technology, permits specialisation and collaboration in large-scale projects,
and allows for the comparison and evaluation of alternative technologies. The
first release of GATE is now available - see
http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page
SWI-Prolog and the Web
Where Prolog is commonly seen as a component in a Web application that is
either embedded or communicates using a proprietary protocol, we propose an
architecture where Prolog communicates to other components in a Web application
using the standard HTTP protocol. By avoiding embedding in external Web servers
development and deployment become much easier. To support this architecture, in
addition to the transfer protocol, we must also support parsing, representing
and generating the key Web document types such as HTML, XML and RDF.
This paper motivates the design decisions in the libraries and extensions to
Prolog for handling Web documents and protocols. The design has been guided by
the requirement to handle large documents efficiently. The described libraries
support a wide range of Web applications ranging from HTML and XML documents to
Semantic Web RDF processing.
To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice
of Logic Programming (TPLP
Design issues in the production of hyperâbooks and visualâbooks
This paper describes an ongoing research project in the area of electronic books. After a brief overview of the state of the art in this field, two new forms of electronic book are presented: hyperâbooks and visualâbooks. A flexible environment allows them to be produced in a semiâautomatic way starting from different sources: electronic texts (as input for hyperâbooks) and paper books (as input for visualâbooks). The translation process is driven by the philosophy of preserving the book metaphor in order to guarantee that electronic information is presented in a familiar way. Another important feature of our research is that hyperâbooks and visualâbooks are conceived not as isolated objects but as entities within an electronic library, which inherits most of the features of a paperâbased library but introduces a number of new properties resulting from its nonâphysical nature
EAD - enabling armchair delivery : approaches to encoding finding aids at the University of Liverpool
EAD is increasingly being selected as the primary data format for constructing archival finding aids in the British Archive Community as the new technologies and know-how required to encode lists are being embraced in many repositories. One major problem facing archivists, though, is how to convert finding aids held in a variety of formats (including databases, word processed documents and paper lists with no machine readable form) into EAD. This article will discuss the methods used in Special Collections and Archives at the University of Liverpool Library in converting finding aids into EAD. Two main examples will be discussed: firstly, designing database output styles which automatically generate EAD tags to wrap around database fields using the ProCite bibliographic database and secondly, offshore keying of paper lists with the addition of basic EAD tags following a rigorous template designed by Special Collections and Archives staff. Both methods have proved effective and have facilitated the generation of EAD encoded lists for a number of our largest collections. Finally, there will be a brief discussion of our use of native EAD generation using AdeptEdit software and our continuing use of conversion methods
- âŠ