22 research outputs found
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
Evaluating conjunction disambiguation on English-to-German and French-to-German WMT 2019 translation hypotheses
We present a test set for evaluating an MT systemâs capability to translate ambiguous conjunctions depending on the sentence structure. We concentrate on the English conjunction âbutâ and its French equivalent âmaisâ which can be translated into two different German conjunctions. We evaluate all English-to-German and French-to-German submissions to the WMT 2019 shared translation task. The evaluation is done mainly automatically, with additional fast manual inspection of unclear cases. All systems almost perfectly recognise the ta-get conjunction âaberâ, whereas accuracies fo rthe other target conjunction âsondernâ range from 78% to 97%, and the errors are mostly caused by replacing it with the alternative cojjunction âaberâ. The best performing system for both language pairs is a multilingual Transformer TartuNLP system trained on all WMT2019 language pairs which use the Latin script, indicating that the multilingual approach is beneficial for conjunction disambiguation. As for other system features, such as using synthetic back-translated data, context-aware, hybrid, etc., no particular (dis)advantages can be observed. Qualitative manual inspection of translation hypotheses shown that highly ranked systems generally produce translations with high adequacy and fluency, meaning that these systems are not only capable of capturing the right conjunction whereas the rest of the translation hypothesis is poor. On the other hand, the low ranked systems generally exhibit lower fluency and poor adequacy
Findings of the 2018 Conference on Machine Translation (WMT18)
This paper presents the results of the premier
shared task organized alongside the Confer-
ence on Machine Translation (WMT) 2018.
Participants were asked to build machine
translation systems for any of 7 language pairs
in both directions, to be evaluated on a test set
of news stories. The main metric for this task
is human judgment of translation quality. This
year, we also opened up the task to additional
test suites to probe specific aspects of transla-
tion
Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports
International audienceIn the seventh edition of the WMT Biomedical Task, we addressed a total of seven language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This yearâs test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previ- ous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams