Search CORE

13 research outputs found

The TALP-UPC phrase-based translation system for EACL-WMT 2009

Author: Banchs Martínez Rafael Enrique
Henríquez Quintana Carlos Alberto
Hernández Adolfo
Khalilov Maxim
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 30/03/2009
Field of study

This study presents the TALP-UPC submission to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation

Author: Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
Hernández Huerta Adolfo
Mariño Acebal José Bernardo
Monte Moreno Enrique
Rodríguez Fonollosa José Adrián
Publication venue
Publication date: 01/01/2012
Field of study

This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

The TALP on-line Spanish-Catalan machine-translation system

Author: Farrús Cabeceran Mireia
Henríquez Quintana Carlos Alberto
Hernández Adolfo
Mariño Acebal José Bernardo
Poch M
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2009
Field of study

In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

UPF Digital Repository

The TALP & I2R SMT Systems for IWSLT 2008

Author: Aw A.
Banchs Martínez Rafael Enrique
Chen B.
Henríquez Quintana Carlos Alberto
Hernández A.
Khalilov Maxim
Li H.
Mariño Acebal José Bernardo
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Zhang M.
Publication venue: NICT/ATR
Publication date: 31/10/2008
Field of study

This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Identifying useful human correction feedback from an on-line machine translation service

Author: Barrón-Cedeño Alberto
Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
May Jonathan
Màrquez Villodre Lluís
Romero Merino Enrique
Publication venue
Publication date: 01/01/2013
Field of study

Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problem and indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe

Identifying useful human feedback from an on-line translation service

Author: Barrón-Cedeño Alberto
Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
May Jonathan
Màrquez Villodre Lluís
Romero Merino Enrique
Publication venue
Publication date: 01/01/2013
Field of study

Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problemand indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Identifying useful human correction feedback from an on-line machine translation service

Author: Barrón-Cedeño Alberto
Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
May Jonathan
Màrquez Villodre Lluís
Romero Merino Enrique
Publication venue
Publication date
Field of study

RECERCAT

Identifying useful human feedback from an on-line translation service

Author: Barrón-Cedeño Alberto
Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
May Jonathan
Màrquez Villodre Lluís
Romero Merino Enrique
Publication venue
Publication date
Field of study

Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problemand indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe

RECERCAT

The TALP-UPC phrase-based translation system for EACL-WMT 2009

Author: Banchs Martínez Rafael Enrique
Henríquez Quintana Carlos Alberto
Hernández Adolfo
Khalilov Maxim
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Publication venue
Publication date
Field of study

RECERCAT

The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation

Author: Formiga Fanals Lluís
Henríquez Quintana Carlos Alberto
Hernández Huerta Adolfo
Mariño Acebal José Bernardo
Monte Moreno Enrique
Rodríguez Fonollosa José Adrián
Publication venue
Publication date
Field of study

RECERCAT