Search CORE

6 research outputs found

Findings of the 2019 Conference on Machine Translation (WMT19)

Author: Barrault Loïc
Bojar Ondřej
Costa-Jussà Marta R.
Federmann Christian
Fishel Mark
Graham Yvette
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

Irish Universities

DCU Online Research Access Service

IITP-MT System for Gujarati-English News Translation Task at WMT 2019

Author: Bhattacharyya Pushpak
Ekbal Asif
Gupta Kamal Kumar
Sen Sukanta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

Author: Callison-Burch Chris
Li Bryan
Patel Ajay
Rasooli Mohammad Sadegh
Publication venue
Publication date: 27/03/2023
Field of study

We propose a two-stage approach for training a single NMT model to translate unseen languages both to and from English. For the first stage, we initialize an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform multilingual fine-tuning on parallel data in 40 languages to English. We find this model can generalize to zero-shot translations on unseen languages. For the second stage, we leverage this generalization ability to generate synthetic parallel data from monolingual datasets, then train with successive rounds of bidirectional back-translation. We term our approach EcXTra ({E}nglish-{c}entric Crosslingual ({X}) {Tra}nsfer). Our approach is conceptually simple, only using a standard cross-entropy objective throughout, and also is data-driven, sequentially leveraging auxiliary parallel data and monolingual data. We evaluate our unsupervised NMT results on 7 low-resource languages, and find that each round of back-translation training further refines bidirectional performance. Our final single EcXTra-trained model achieves competitive translation performance in all translation directions, notably establishing a new state-of-the-art for English-to-Kazakh (22.9 > 10.4 BLEU).Comment: LoResMT @ EACL 202

arXiv.org e-Print Archive

MuLER: Detailed and Scalable Reference-based Evaluation

Author: Abend Omri
Choshen Leshem
Karidi Taelin
Patel Gal
Publication venue
Publication date: 24/05/2023
Field of study

We propose a novel methodology (namely, MuLER) that transforms any reference-based evaluation metric for text generation, such as machine translation (MT) into a fine-grained analysis tool. Given a system and a metric, MuLER quantifies how much the chosen metric penalizes specific error types (e.g., errors in translating names of locations). MuLER thus enables a detailed error analysis which can lead to targeted improvement efforts for specific phenomena. We perform experiments in both synthetic and naturalistic settings to support MuLER's validity and showcase its usability in MT evaluation, and other tasks, such as summarization. Analyzing all submissions to WMT in 2014-2020, we find consistent trends. For example, nouns and verbs are among the most frequent POS tags. However, they are among the hardest to translate. Performance on most POS tags improves with overall system performance, but a few are not thus correlated (their identity changes from language to language). Preliminary experiments with summarization reveal similar trends

arXiv.org e-Print Archive

Survey of Low-Resource Machine Translation

Author: Bawden Rachel
Birch Alexandra
Haddow Barry
Helcl Jindřich
Miceli Barone Antonio Valerio
Publication venue: Massachusetts Institute of Technology Press (MIT Press)
Publication date: 01/01/2022
Field of study

International audienceWe present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT

INRIA a CCSD electronic archive server

Findings of the 2022 Conference on Machine Translation (WMT22)

Author: Bawden Rachel
Bojar Ondřej
Dvorkovich Anton
Federmann Christian
Fishel Mark
Gowda Thamme
Graham Yvette
Grundkiewicz Roman
Haddow Barry
Knowles Rebecca
Kocmi Tom
Koehn Philipp
Monz Christof
Morishita Makoto
Nagata Masaaki
Nakazawa Toshiaki
Novák Michal
Popel Martin
Popović Maja
Shmatova Mariya
Publication venue: HAL CCSD
Publication date: 07/12/2022
Field of study

International audienceThis paper presents the results of the General Machine Translation Task organised as part of the Conference on Machine Translation (WMT) 2022. In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of four different domains. We evaluate system outputs with human annotators using two different techniques: reference-based direct assessment and (DA) and a combination of DA and scalar quality metric (DA+SQM)

INRIA a CCSD electronic archive server