Search CORE

317 research outputs found

Results of the WMT16 Metrics Shared Task

Author: Bojar Ondřej
Graham Yvette
Kamran Amir
Stanojević Miloš
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents the results of the WMT16 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT16 Shared Translation Task. We collected scores of 16 metrics from 9 research groups. In addition to that, we computed scores of 9 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system-level correlation (how well each metric’s scores correlate with WMT16 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in comparing two translations of a particular sentence). This year there are several additions to the setup: large number of language pairs (18 in total), datasets from different domains (news, IT and medical), and different kinds of judgments: relative ranking (RR), direct assessment (DA) and HUME manual semantic judgments. Finally, generation of large number of hybrid systems was trialed for provision of more conclusive system-level metric rankings

Crossref

Biblio at Institute of Formal and Applied Linguistics

A Shared Task on Bandit Learning for Machine Translation

Author: Danchenko Pavel
Fürstenau Hagen
Kreutzer Julia
Riezler Stefan
Sokolov Artem
Sunderland Kellen
Szymaniak Witold
Publication venue
Publication date: 01/01/2017
Field of study

We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On each of a sequence of rounds, a machine translation system is required to propose a translation for an input, and receives a real-valued estimate of the quality of the proposed translation for learning. This paper describes the shared task's learning and evaluation setup, using services hosted on Amazon Web Services (AWS), the data and evaluation metrics, and the results of various machine translation architectures and learning protocols.Comment: Conference on Machine Translation (WMT) 201

arXiv.org e-Print Archive

Crossref

LIUM-CVC Submissions for WMT17 Multimodal Translation Task

Author: Aransa Walid
Bardet Adrien
Barrault Loïc
Bougares Fethi
Caglayan Ozan
García-Martínez Mercedes
Herranz Luis
Masana Marc
van de Weijer Joost
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.Comment: MMT System Description Paper for WMT1

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Results of the WMT16 Tuning Shared Task

Author: Bojar Ondřej
Jawaid Bushra
Kamran Amir
Stanojević Miloš
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents the results of the WMT16 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the Czech-English and 8 in the English-Czech translation direction. In addition, we ran 2 baseline setups, tuning the parameters with standard optimizers for BLEU score. In contrast to previous years, the tuned systems in 2016 rely on large data

Crossref

Biblio at Institute of Formal and Applied Linguistics

Does Multimodality Help Human and Machine for Translation and Image Captioning?

Author: Aransa Walid
Barrault Loïc
Bougares Fethi
Caglayan Ozan
García-Martínez Mercedes
Masana Marc
van de Weijer Joost
Wang Yaxing
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.Comment: 7 pages, 2 figures, v4: Small clarification in section 4 title and conten

arXiv.org e-Print Archive

Crossref

Findings of the 2017 Conference on Machine Translation

Author: Bojar Ondřej
Chatterjee Rajen
Federmann Christian
Graham Yvette
Haddow Barry
Huang Shujian
Huck Matthias
Koehn Philipp
Liu Qun
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Rubino Raphael
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

This paper presents the results of the WMT17 shared tasks, which included three machine translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Irish Universities

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

DCU Online Research Access Service

Exploring Prediction Uncertainty in Machine Translation Quality Estimation

Author: Beck Daniel
Cohn Trevor
Specia Lucia
Publication venue
Publication date: 01/01/2016
Field of study

Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty. However, models in this task are traditionally evaluated only in terms of point estimate metrics, which do not take prediction uncertainty into account. We investigate probabilistic methods for Quality Estimation that can provide well-calibrated uncertainty estimates and evaluate them in terms of their full posterior predictive distributions. We also show how this posterior information can be useful in an asymmetric risk scenario, which aims to capture typical situations in translation workflows.Comment: Proceedings of CoNLL 201

arXiv.org e-Print Archive

Crossref

Publikationsserver der RWTH Aachen University

Aligning Neural Machine Translation Models: Human Feedback in Training and Inference

Author: Farinhas António
Fernandes Patrick
Martins André F. T.
Ramos Miguel Moura
Publication venue
Publication date: 15/11/2023
Field of study

Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a language model, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs. In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, recent methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation. In this study, we comprehensively explore and compare techniques for integrating quality metrics as reward models into the MT pipeline. This includes using the reward model for data filtering, during the training phase through RL, and at inference time by employing reranking techniques, and we assess the effects of combining these in a unified approach. Our experimental results, conducted across multiple translation tasks, underscore the crucial role of effective data filtering, based on estimated quality, in harnessing the full potential of RL in enhancing MT quality. Furthermore, our findings demonstrate the effectiveness of combining RL training with reranking techniques, showcasing substantial improvements in translation quality.Comment: 14 pages, work-in-progres

arXiv.org e-Print Archive