88,108 research outputs found
Filling Knowledge Gaps in a Broad-Coverage Machine Translation System
Knowledge-based machine translation (KBMT) techniques yield high quality in
domains with detailed semantic models, limited vocabulary, and controlled input
grammar. Scaling up along these dimensions means acquiring large knowledge
resources. It also means behaving reasonably when definitive knowledge is not
yet available. This paper describes how we can fill various KBMT knowledge
gaps, often using robust statistical techniques. We describe quantitative and
qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT
system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9
Capturing translational divergences with a statistical tree-to-tree aligner
Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful
for many applications, particularly data-driven machine translation. In this paper, we focus on how translational
divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We
observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments
is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision
rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in
direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only
for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT
Building a large ontology for machine translation
This paper describes efforts underway to construct a largescale ontology to support semantic processing in the PAN-GLOSS knowledge-base machine translation system. Because we axe aiming at broad sem~tntic coverage, we are focusing on automatic and semi-automatic methods of knowledge acquisition. Here we report on algorithms for merging complementary online resources, in particular the LDOCE and WordNet dictionaries. We discuss empirical results, and how these results have been incorporated into the PANGLOSS ontology. 1
Automatic evaluation of generation and parsing for machine translation with automatically acquired transfer rules
This paper presents a new method of evaluation for generation and parsing components of transfer-based MT systems where the transfer rules have been automatically
acquired from parsed sentence-aligned bitext corpora. The method provides a means of quantifying the upper bound imposed on the MT system by the quality of the parsing
and generation technologies for the target language. We include experiments to calculate this upper bound for both handcrafted and automatically induced parsing and generation technologies currently in use by transfer-based MT systems
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
JACY - a grammar for annotating syntax, semantics and pragmatics of written and spoken japanese for NLP application purposes
In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004)
- …