1 research outputs found

    The NICT Translation System for IWSLT 2012

    No full text
    Abstract This paper describes NICT's participation in the IWSLT 2012 evaluation campaign for the TED speech translation RussianEnglish shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented by using transliteration mining techniques. The basic premise behind our approach was to try to use sub-word-level alignments to guide the word-level alignment process used to learn the phrase-table. We did this by first mining a corpus of Russian-English transliterations pairs and cognates from a set of interlanguage link titles from Wikipedia. This corpus was then used to build a manyto-many nonparametric Bayesian bilingual alignment model that could be used to identify the occurrence of transliterations and cognates in the training corpus itself. Alignment counts for these mined pairs were increased in the training corpus to increase the likelihood that these pairs would align in training. Our experiments on the test sets from the 2010 and 2011 shared tasks, showed that an improvement in BLEU score can be gained in translation performance by encouraging the alignment of cognates and transliterations during word alignment
    corecore