Search CORE

1,642 research outputs found

New Methods, Current Trends and Software Infrastructure for NLP

Author: Cunningham Hamish
Gaizauskas Robert J.
Wilks Yorick
Publication venue
Publication date: 01/01/1996
Field of study

The increasing use of `new methods' in NLP, which the NeMLaP conference series exemplifies, occurs in the context of a wider shift in the nature and concerns of the discipline. This paper begins with a short review of this context and significant trends in the field. The review motivates and leads to a set of requirements for support software of general utility for NLP research and development workers. A freely-available system designed to meet these requirements is described (called GATE - a General Architecture for Text Engineering). Information Extraction (IE), in the sense defined by the Message Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed. \cite{Jac92}). An IE system based on GATE is also available for research purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included

arXiv.org e-Print Archive

CiteSeerX

Design issues in the production of hyper‐books and visual‐books

Author: Catenazzi Nadia
Gibb Forbes
Landoni Monica
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1993
Field of study

This paper describes an ongoing research project in the area of electronic books. After a brief overview of the state of the art in this field, two new forms of electronic book are presented: hyper‐books and visual‐books. A flexible environment allows them to be produced in a semi‐automatic way starting from different sources: electronic texts (as input for hyper‐books) and paper books (as input for visual‐books). The translation process is driven by the philosophy of preserving the book metaphor in order to guarantee that electronic information is presented in a familiar way. Another important feature of our research is that hyper‐books and visual‐books are conceived not as isolated objects but as entities within an electronic library, which inherits most of the features of a paper‐based library but introduces a number of new properties resulting from its non‐physical nature

Crossref

ALT Open Access Repository

Directory of Open Access Journals

A multi-layered Bayesian network model for structured document retrieval

Author: Crestani Fabio
de Campos Luis M.
Fernandez Luna Juan M.
Huete Juan F.
Publication venue
Publication date: 07/04/2004
Field of study

New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

University of Strathclyde Institutional Repository

A multi-layered Bayesian network model for structured document retrieval

Author: F. Crestani
G. Bordogna
G. Salton
H.R. Turtle
J. Vegas
L.M. Campos de
M. Lalmas
S. Acid
T. Roelleke
Y. Chiaramella
Publication venue
Publication date: 01/01/2003
Field of study

Crossref

University of Strathclyde Institutional Repository

Thick 2D Relations for Document Understanding

Author: Aiello Marco
Smeulders Arnold M.W.
Publication venue
Publication date: 01/01/2002
Field of study

We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%

Unitn-eprints Research

Software Infrastructure for Natural Language Processing

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Wilks Yorick
Publication venue
Publication date: 01/01/1997
Field of study

We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

arXiv.org e-Print Archive

CiteSeerX

SWI-Prolog and the Web

Author: Gras
Huang
JAN WIELEMAKER
LOURENS VAN DER MEIJ
Mäkelä
Ramakrishnan
Wielemaker
Wielemaker
Wielemaker
ZHISHENG HUANG
Publication venue
Publication date: 06/11/2007
Field of study

Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

arXiv.org e-Print Archive

VU Research Portal

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Ranking structured documents using utility theory in the Bayesian network retrieval model

Author: F. Crestani
G. Bordogna
G. Kazai
L.M. Campos de
M. Lalmas
R. Baeza-Yates
R.D. Shachter
S. Acid
S. French
T. Roelleke
Y. Chiaramella
Publication venue
Publication date: 01/01/2003
Field of study

In this paper a new method based on Utility and Decision theory is presented to deal with structured documents. The aim of the application of these methodologies is to refine a first ranking of structural units, generated by means of an Information Retrieval Model based on Bayesian Networks. Units are newly arranged in the new ranking by combining their posterior probabilities, obtained in the first stage, with the expected utility of retrieving them. The experimental work has been developed using the Shakespeare structured collection and the results show an improvement of the effectiveness of this new approach

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

GATE -- an Environment to Support Research and Development in Natural Language Engineering

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Rodgers Peter
Wilks Yorick
Publication venue: IEEE Computer Society
Publication date: 01/01/1996
Field of study

We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

CiteSeerX

Kent Academic Repository