13 research outputs found
The TALP-UPC phrase-based translation system for EACL-WMT 2009
This study presents the TALP-UPC submission
to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks.Postprint (published version
The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
This paper describes the UPC participation in
the WMT 12 evaluation campaign. All sys-
tems presented are based on standard phrase-
based Moses systems. Variations adopted sev-
eral improvement techniques such as mor-
phology simplification and generation and do-
main adaptation. The morphology simpli-
fication overcomes the data sparsity prob-
lem when translating into morphologically-
rich languages such as Spanish by translat-
ing first to a morphology-simplified language
and secondly leave the morphology gener-
ation to an independent classification task.
The domain adaptation approach improves the
SMT system by adding new translation units
learned from MT-output and reference align-
ment. Results depict an improvement on TER,
METEOR, NIST and BLEU scores compared
to our baseline system, obtaining on the of-
ficial test set more benefits from the domain
adaptation approach than from the morpho-
logical generalization method.Peer ReviewedPostprint (published version
The TALP on-line Spanish-Catalan machine-translation system
In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.Postprint (published version
The TALP & I2R SMT Systems for IWSLT 2008
This paper gives a description of the statistical machine
translation (SMT) systems developed at the TALP Research
Center of the UPC (Universitat Polit`ecnica de Catalunya)
for our participation in the IWSLT’08 evaluation campaign.
We present Ngram-based (TALPtuples) and phrase-based
(TALPphrases) SMT systems. The paper explains the 2008
systems’ architecture and outlines translation schemes we
have used, mainly focusing on the new techniques that are
challenged to improve speech-to-speech translation quality.
The novelties we have introduced are: improved reordering
method, linear combination of translation and reordering
models and new technique dealing with punctuation marks
insertion for a phrase-based SMT system.
This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation
tasks.Postprint (published version
Identifying useful human correction feedback from an on-line machine translation service
Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problem and indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe
Identifying useful human feedback from an on-line translation service
Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine
translation (SMT) systems. However, feedback
provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from
Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement
over the majority class, but also a precision
ceiling around 70-80%. This reflects the inherent difficulty of the problemand indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective
in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer ReviewedPostprint (published version
Identifying useful human correction feedback from an on-line machine translation service
Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement over the majority class, but also a precision ceiling around 70-80%. This reflects the inherent difficulty of the problem and indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe
Identifying useful human feedback from an on-line translation service
Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine
translation (SMT) systems. However, feedback
provided by casual users is very noisy, and must be automatically filtered in order to identify the potentially useful cases. We present a study on automatic feedback filtering in a real weblog collected from
Reverso.net. We extend and re-annotate a training corpus, define an extended set of simple features and approach the problem as a binary classification task, experimenting with linear and kernelbased classifiers and feature selection. Results on the feedback filtering task show a significant improvement
over the majority class, but also a precision
ceiling around 70-80%. This reflects the inherent difficulty of the problemand indicates that shallow features cannot fully capture the semantic nature of the problem. Despite the modest results on the filtering task, the classifiers are proven effective
in an application-based evaluation. The incorporation of a filtered set of feedback instances selected from a larger corpus significantly improves the performance of a phrase-based SMT system, according to a set of standard evaluation metrics.Peer Reviewe
The TALP-UPC phrase-based translation system for EACL-WMT 2009
This study presents the TALP-UPC submission
to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks
The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
This paper describes the UPC participation in
the WMT 12 evaluation campaign. All sys-
tems presented are based on standard phrase-
based Moses systems. Variations adopted sev-
eral improvement techniques such as mor-
phology simplification and generation and do-
main adaptation. The morphology simpli-
fication overcomes the data sparsity prob-
lem when translating into morphologically-
rich languages such as Spanish by translat-
ing first to a morphology-simplified language
and secondly leave the morphology gener-
ation to an independent classification task.
The domain adaptation approach improves the
SMT system by adding new translation units
learned from MT-output and reference align-
ment. Results depict an improvement on TER,
METEOR, NIST and BLEU scores compared
to our baseline system, obtaining on the of-
ficial test set more benefits from the domain
adaptation approach than from the morpho-
logical generalization method.Peer Reviewe