13 research outputs found

    The TALP-UPC phrase-based translation system for EACL-WMT 2009

    Get PDF
    This study presents the TALP-UPC submission to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks.Postprint (published version

    The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation

    Get PDF
    This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.Peer ReviewedPostprint (published version

    The TALP on-line Spanish-Catalan machine-translation system

    Get PDF
    In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.Postprint (published version

    The TALP & I2R SMT Systems for IWSLT 2008

    Get PDF
    This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.Postprint (published version

    Identifying useful human correction feedback from an on-line machine translation service

    No full text
    Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problem and indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe

    Identifying useful human feedback from an on-line translation service

    Get PDF
    Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problemand indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer ReviewedPostprint (published version

    Identifying useful human correction feedback from an on-line machine translation service

    No full text
    Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problem and indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe

    Identifying useful human feedback from an on-line translation service

    No full text
    Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problemand indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe

    The TALP-UPC phrase-based translation system for EACL-WMT 2009

    No full text
    This study presents the TALP-UPC submission to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks

    The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation

    No full text
    This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.Peer Reviewe
    corecore