6 research outputs found
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation
We propose a two-stage approach for training a single NMT model to translate
unseen languages both to and from English. For the first stage, we initialize
an encoder-decoder model to pretrained XLM-R and RoBERTa weights, then perform
multilingual fine-tuning on parallel data in 40 languages to English. We find
this model can generalize to zero-shot translations on unseen languages. For
the second stage, we leverage this generalization ability to generate synthetic
parallel data from monolingual datasets, then train with successive rounds of
bidirectional back-translation.
We term our approach EcXTra ({E}nglish-{c}entric Crosslingual ({X})
{Tra}nsfer). Our approach is conceptually simple, only using a standard
cross-entropy objective throughout, and also is data-driven, sequentially
leveraging auxiliary parallel data and monolingual data. We evaluate our
unsupervised NMT results on 7 low-resource languages, and find that each round
of back-translation training further refines bidirectional performance. Our
final single EcXTra-trained model achieves competitive translation performance
in all translation directions, notably establishing a new state-of-the-art for
English-to-Kazakh (22.9 > 10.4 BLEU).Comment: LoResMT @ EACL 202
MuLER: Detailed and Scalable Reference-based Evaluation
We propose a novel methodology (namely, MuLER) that transforms any
reference-based evaluation metric for text generation, such as machine
translation (MT) into a fine-grained analysis tool.
Given a system and a metric, MuLER quantifies how much the chosen metric
penalizes specific error types (e.g., errors in translating names of
locations). MuLER thus enables a detailed error analysis which can lead to
targeted improvement efforts for specific phenomena.
We perform experiments in both synthetic and naturalistic settings to support
MuLER's validity and showcase its usability in MT evaluation, and other tasks,
such as summarization. Analyzing all submissions to WMT in 2014-2020, we find
consistent trends. For example, nouns and verbs are among the most frequent POS
tags. However, they are among the hardest to translate. Performance on most POS
tags improves with overall system performance, but a few are not thus
correlated (their identity changes from language to language). Preliminary
experiments with summarization reveal similar trends
Survey of Low-Resource Machine Translation
International audienceWe present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT
Findings of the 2022 Conference on Machine Translation (WMT22)
International audienceThis paper presents the results of the General Machine Translation Task organised as part of the Conference on Machine Translation (WMT) 2022. In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of four different domains. We evaluate system outputs with human annotators using two different techniques: reference-based direct assessment and (DA) and a combination of DA and scalar quality metric (DA+SQM)