91 research outputs found
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
Findings of the WMT 2020 shared task on quality estimation
© 2020 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence.
The published version can be accessed at the following link on the publisher’s website; https://www.aclweb.org/anthology/2020.wmt-1.79We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels. This edition included new data with open domain texts, direct assessment annotations, and multiple language pairs: English-German, English-Chinese, Russian-English, Romanian-English, Estonian-English, Sinhala-English and Nepali-English data for the sentence-level subtasks, English-German and English-Chinese for the word-level subtask, and English-French data for the document-level subtask. In addition, we made neural machine translation models available to participants. 19 participating teams from 27 institutions submitted altogether 1374 systems to different task variants and language pairs
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved
Findings of the 2022 Conference on Machine Translation (WMT22)
International audienceThis paper presents the results of the General Machine Translation Task organised as part of the Conference on Machine Translation (WMT) 2022. In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of four different domains. We evaluate system outputs with human annotators using two different techniques: reference-based direct assessment and (DA) and a combination of DA and scalar quality metric (DA+SQM)
Recommended from our members
Domain adaptation for neural machine translation
The development of deep learning techniques has allowed Neural Machine Translation (NMT) models to become extremely powerful, given sufficient training data and training time. However, such translation models struggle when translating text of a specific domain. A domain may consist of text on a well-defined topic, or text of unknown provenance with an identifiable vocabulary distribution, or language with some other stylometric feature. While NMT models can achieve good translation performance on domain-specific data via simple tuning on a representative training corpus, such data-centric approaches have negative side-effects. These include over-fitting, brittleness, and `catastrophic forgetting' of previous training examples.
In this thesis we instead explore more robust approaches to domain adaptation for NMT. We consider the case where a system is adapted to a specified domain of interest, but may also need to accommodate new language, or domain-mismatched sentences. We explore techniques relating to data selection and curriculum, model parameter adaptation procedure, and inference procedure. We show that iterative fine-tuning can achieve strong performance over multiple related domains, and that Elastic Weight Consolidation can be used to mitigate catastrophic forgetting in NMT domain adaptation across multiple sequential domains. We develop a robust variant of Minimum Risk Training which allows more beneficial use of small, highly domain-specific tuning sets than simple cross-entropy fine-tuning, and can mitigate exposure bias resulting from domain over-fitting. We extend Bayesian Interpolation inference schemes to Neural Machine Translation, allowing adaptive weighting of NMT ensembles to translate text from an unknown domain.
Finally we demonstrate the benefit of multi-domain adaptation approaches for other lines of NMT research. We show that NMT systems using multiple forms of data representation can benefit from multi-domain inference approaches. We also demonstrate a series of domain adaptation approaches to mitigating the effects of gender bias in machine translation
- …