Search CORE

23,862 research outputs found

Rapid Development of Morphological Descriptions for Full Language Processing Systems

Author: Carter David
Publication venue
Publication date: 01/01/1995
Field of study

I describe a compiler and development environment for feature-augmented two-level morphology rules integrated into a full NLP system. The compiler is optimized for a class of languages including many or most European ones, and for rapid development and debugging of descriptions of new languages. The key design decision is to compose morphophonological and morphosyntactic information, but not the lexicon, when compiling the description. This results in typical compilation times of about a minute, and has allowed a reasonably full, feature-based description of French inflectional morphology to be developed in about a month by a linguist new to the system.Comment: 8 pages, LaTeX (2.09 preferred); eaclap.sty; Procs of Euro ACL-9

arXiv.org e-Print Archive

CiteSeerX

Crossref

Applying Machine Translation to Two-Stage Cross-Language Information Retrieval

Author: Fujii Atsushi
Ishikawa Tetsuya
Publication venue
Publication date: 01/01/2000
Field of study

Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.Comment: 13 pages, 1 Postscript figur

arXiv.org e-Print Archive

CiteSeerX

Recycling Lingware in a Multilingual MT System

Author: Bretan Ivan
Carter David
Eklund Robert
Hansen Steffen Leo
Kirchmeier-Andersen Sabine
Philp Christina
Rayner Manny
Sorensen Finn
Thomsen Hanne Erdman
Wiren Mats
Publication venue
Publication date: 01/01/1997
Field of study

We describe two methods relevant to multi-lingual machine translation systems, which can be used to port linguistic data (grammars, lexicons and transfer rules) between systems used for processing related languages. The methods are fully implemented within the Spoken Language Translator system, and were used to create versions of the system for two new language pairs using only a month of expert effort.Comment: 6 pages, needs aclap.sty. To appear in "From Research to Commercial Applications" workshop at ACL-97, see also http://www.cam.sri.co

arXiv.org e-Print Archive

CiteSeerX

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Pattern Matching and Discourse Processing in Information Extraction from Japanese Text

Author: Eriguchi Y.
Hara M.
Kitani T.
Publication venue
Publication date: 01/01/1994
Field of study

Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto a structured output format, complex discourse processing is essential. This paper reports on a Japanese information extraction system that merges information using a pattern matcher and discourse processor. Evaluation results show a high level of system performance which approaches human performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Simple identification tools in FishBase

Author: Atanacio Rachek
Bailly Nicolas
Froese Rainer
Reyes Jr. Rodolfo
Publication venue: EUT - Edizioni Università di Trieste
Publication date: 01/01/2010
Field of study

Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy

OceanRep

OpenstarTs

AFRILEX 2002: 7th international conference of the African Association for Lexicography: Culture and dictionaries: programme and abstracts

Author: de Schryver Gilles-Maurice
Publication venue: (SF)2 Press
Publication date: 01/01/2002
Field of study

Ghent University Academic Bibliography