23,862 research outputs found
Rapid Development of Morphological Descriptions for Full Language Processing Systems
I describe a compiler and development environment for feature-augmented
two-level morphology rules integrated into a full NLP system. The compiler is
optimized for a class of languages including many or most European ones, and
for rapid development and debugging of descriptions of new languages. The key
design decision is to compose morphophonological and morphosyntactic
information, but not the lexicon, when compiling the description. This results
in typical compilation times of about a minute, and has allowed a reasonably
full, feature-based description of French inflectional morphology to be
developed in about a month by a linguist new to the system.Comment: 8 pages, LaTeX (2.09 preferred); eaclap.sty; Procs of Euro ACL-9
Applying Machine Translation to Two-Stage Cross-Language Information Retrieval
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, needs a translation of queries and/or documents, so as
to standardize both of them into a common representation. For this purpose, the
use of machine translation is an effective approach. However, computational
cost is prohibitive in translating large-scale document collections. To resolve
this problem, we propose a two-stage CLIR method. First, we translate a given
query into the document language, and retrieve a limited number of foreign
documents. Second, we machine translate only those documents into the user
language, and re-rank them based on the translation result. We also show the
effectiveness of our method by way of experiments using Japanese queries and
English technical documents.Comment: 13 pages, 1 Postscript figur
Recycling Lingware in a Multilingual MT System
We describe two methods relevant to multi-lingual machine translation
systems, which can be used to port linguistic data (grammars, lexicons and
transfer rules) between systems used for processing related languages. The
methods are fully implemented within the Spoken Language Translator system, and
were used to create versions of the system for two new language pairs using
only a month of expert effort.Comment: 6 pages, needs aclap.sty. To appear in "From Research to Commercial
Applications" workshop at ACL-97, see also http://www.cam.sri.co
Pattern Matching and Discourse Processing in Information Extraction from Japanese Text
Information extraction is the task of automatically picking up information of
interest from an unconstrained text. Information of interest is usually
extracted in two steps. First, sentence level processing locates relevant
pieces of information scattered throughout the text; second, discourse
processing merges coreferential information to generate the output. In the
first step, pieces of information are locally identified without recognizing
any relationships among them. A key word search or simple pattern search can
achieve this purpose. The second step requires deeper knowledge in order to
understand relationships among separately identified pieces of information.
Previous information extraction systems focused on the first step, partly
because they were not required to link up each piece of information with other
pieces. To link the extracted pieces of information and map them onto a
structured output format, complex discourse processing is essential. This paper
reports on a Japanese information extraction system that merges information
using a pattern matcher and discourse processor. Evaluation results show a high
level of system performance which approaches human performance.Comment: See http://www.jair.org/ for any accompanying file
Simple identification tools in FishBase
Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further
development. It explores the possibility of a holistic and integrated computeraided strategy
- …