26,429 research outputs found

    Learning Bilingual Word Representations by Marginalizing Alignments

    Full text link
    We present a probabilistic model that simultaneously learns alignments and distributed representations for bilingual data. By marginalizing over word alignments the model captures a larger semantic context than prior work relying on hard alignments. The advantage of this approach is demonstrated in a cross-lingual classification task, where we outperform the prior published state of the art.Comment: Proceedings of ACL 2014 (Short Papers

    Tuning syntactically enhanced word alignment for statistical machine translation

    Get PDF
    We introduce a syntactically enhanced word alignment model that is more flexible than state-of-the-art generative word alignment models and can be tuned according to different end tasks. First of all, this model takes the advantages of both unsupervised and supervised word alignment approaches by obtaining anchor alignments from unsupervised generative models and seeding the anchor alignments into a supervised discriminative model. Second, this model offers the flexibility of tuning the alignment according to different optimisation criteria. Our experiments show that using our word alignment in a Phrase-Based Statistical Machine Translation system yields a 5.38% relative increase on IWSLT 2007 task in terms of BLEU score

    Tracking relevant alignment characteristics for machine translation

    Get PDF
    In most statistical machine translation (SMT) systems, bilingual segments are extracted via word alignment. In this paper we compare alignments tuned directly according to alignment F-score and BLEU score in order to investigate the alignment characteristics that are helpful in translation. We report results for two different SMT systems (a phrase-based and an n-gram-based system) on Chinese to English IWSLT data, and Spanish to English European Parliament data. We give alignment hints to improve BLEU score, depending on the SMT system used and the type of corpus

    The impact of morphological errors in phrase-based statistical machine translation from German and English into Swedish

    Get PDF
    We have investigated the potential for improvement in target language morphology when translating into Swedish from English and German, by measuring the errors made by a state of the art phrase-based statistical machine translation system. Our results show that there is indeed a performance gap to be filled by better modelling of inflectional morphology and compounding; and that the gap is not filled by simply feeding the translation system with more training data
    corecore