3 research outputs found
Findings of the WMT 2018 Shared Task on Automatic Post-Editing
We present the results from the fourth round of the WMT shared task on MTAutomatic Post-Editing. The task consists in automatically correcting the output of a “black-box” machine translation system by learning from human corrections. Keeping the same general evaluation setting of the three previous rounds, this year we focused on one language pair (English-German) and on domain-specific data (Information Technology), with MT outputs produced by two different paradigms: phrase-based (PBSMT) and neural (NMT). Five teams submitted respectively 11 runs for the PBSMT subtask and 10 runs for the NMT sub-task. In the former subtask, characterized by original translations of lower quality, top results achieved impressive improvements, up to -6.24 TER and +9.53 BLEU points over the baseline “do-nothing” system. The NMT subtask proved to be more challenging due to the higher quality of the original translations and the availability of less training data. In this case, top results show smaller improvements up to-0.38 TER and +0.8 BLEU points
MS-UEdin Submission to the WMT2018 APE Shared Task:Dual-Source Transformer for Automatic Post-Editing
This paper describes the Microsoft and University of Edinburgh submission to
the Automatic Post-editing shared task at WMT2018. Based on training data and
systems from the WMT2017 shared task, we re-implement our own models from the
last shared task and introduce improvements based on extensive parameter
sharing. Next we experiment with our implementation of dual-source transformer
models and data selection for the IT domain. Our submissions decisively wins
the SMT post-editing sub-task establishing the new state-of-the-art and is a
very close second (or equal, 16.46 vs 16.50 TER) in the NMT sub-task. Based on
the rather weak results in the NMT sub-task, we hypothesize that
neural-on-neural APE might not be actually useful.Comment: Winning submissions for WMT2018 APE shared tas
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation