2 research outputs found

    Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

    Get PDF
    We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.

    The RWTH aachen system for NTCIR-9 PatentMT

    Get PDF
    This paper describes the statistical machine translation (SMT) systems developed by RWTH Aachen University for the Patent Translation task of the 9th NTCIR Workshop. Both phrase-based and hierarchical SMT systems were trained for the constrained Japanese-English and Chinese-English tasks. Experiments were conducted to compare different training data sets, training methods and optimization criteria as well as additional models for syntax and phrase reordering. Further, for the Chinese-English subtask we applied a system combination technique to create a consensus hypothesis from several different systems
    corecore