19,501 research outputs found

    Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

    Full text link
    Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

    Using WordNet for Building WordNets

    Full text link
    This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The English WordNet is used in this approach as a backbone for Catalan and Spanish WordNets and as a lexical knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL

    Some Issues on Ontology Integration

    Get PDF
    The word integration has been used with different meanings in the ontology field. This article aims at clarifying the meaning of the word “integration” and presenting some of the relevant work done in integration. We identify three meanings of ontology “integration”: when building a new ontology reusing (by assembling, extending, specializing or adapting) other ontologies already available; when building an ontology by merging several ontologies into a single one that unifies all of them; when building an application using one or more ontologies. We discuss the different meanings of “integration”, identify the main characteristics of the three different processes and proposethree words to distinguish among those meanings:integration, merge and use

    Learning Semantic Correspondences in Technical Documentation

    Full text link
    We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals.Comment: accepted to ACL-201

    The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

    Get PDF
    Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

    Unification-Based Glossing

    Full text link
    We present an approach to syntax-based machine translation that combines unification-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the use of feature structures to encode word lattices and the use of unification to compose and manipulate lattices. Unification also allows us to specify abstract features that delay target-language synthesis until enough source-language information is assembled. Our statistical component enables us to search efficiently among competing translations and locate those with high English fluency.Comment: 8 pages, Compressed and uuencoded postscript. To appear: IJCAI-9
    • …
    corecore