Search CORE

7 research outputs found

Results of the WMT17 Neural MT Training Task

Author: Bojar Ondřej
Helcl Jindřich
Kocmi Tom
Libovický Jindřich
Musil Tomáš
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

This paper presents the results of the WMT17 Neural MT Training Task. The objective of this task is to explore the methods of training a fixed neural architecture, aiming primarily at the best translation quality and, as a secondary goal, shorter training time. Task participants were provided with a complete neural machine translation system, fixed training data and the configuration of the network. The translation was performed in the English-to-Czech direction and the task was divided into two subtasks of different configurations - one scaled to fit on a 4GB and another on an 8GB GPU card. We received 3 submissions for the 4GB variant and 1 submission for the 8GB variant; we provided also our run for each of the sizes and two baselines. We translated the test set with the trained models and evaluated the outputs using several automatic metrics. We also report results of the human evaluation of the submitted systems

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

Findings of the 2017 Conference on Machine Translation

Author: Bojar Ondřej
Chatterjee Rajen
Federmann Christian
Graham Yvette
Haddow Barry
Huang Shujian
Huck Matthias
Koehn Philipp
Liu Qun
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Rubino Raphael
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

This paper presents the results of the WMT17 shared tasks, which included three machine translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Irish Universities

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

DCU Online Research Access Service

Findings of the 2017 Conference on Machine Translation (WMT17)

Author: Barry Haddow
Christian Federmann
Christof Monz
Lucia Specia
Marco Turchi .
Matt Post
Matteo Negri
Matthias Huck
Ondˇrej Bojar
Philipp Koehn
Qun Liu
Rajen Chatterjee
Raphael Rubino
Shujianhuang
Varvara Logacheva
Yvette Graham
Publication venue: The Association for Computational Linguistics
Publication date
Field of study

This paper presents the results of theWMT17 shared tasks, which included three machine translation (MT) tasks(news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task

Archivio della ricerca - Fondazione Bruno Kessler

Findings of the 2019 Conference on Machine Translation (WMT19)

Author: Barrault Loïc
Bojar Ondřej
Costa-Jussà Marta R.
Federmann Christian
Fishel Mark
Graham Yvette
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

Irish Universities

DCU Online Research Access Service

Findings of the 2018 Conference on Machine Translation (WMT18)

Author: Bojar Ondřej
Federmann Christian
Fishel Mark
Graham Yvette
Haddow Barry
Huck Matthias
Koehn Philipp
Monz Christof
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

This paper presents the results of the premier shared task organized alongside the Confer- ence on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test suites to probe specific aspects of transla- tion

Crossref

Irish Universities

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

DCU Online Research Access Service

Extracting correctly aligned segments from unclean parallel data using character n-gram matching

Author: Poncelas Alberto
Popović Maja
Publication venue: SDJT – Slovensko društvo za jezikovne tehnologije
Publication date: 01/01/2020
Field of study

Training of Neural Machine Translation systems is a time- and resource-demanding task, especially when large amounts of parallel texts are used. In addition, it is sensitive to unclean parallel data. In this work, we explore a data cleaning method based on character n-gram matching. The method is particularly convenient for closely related language since the n-gram matching scores can be calculated directly on the source and the target parts of the training corpus. For more distant languages, a translation step is needed and then the MT output is compared with the corresponding original part. We show that the proposed method not only reduces the amount of training corpus, but also can increase the system’s performance

Irish Universities

DCU Online Research Access Service

ParaCrawl: Web-Scale Acquisition of Parallel Corpora

Author: Bañón Marta
Chen Pinzhen
Esplà-Gomis Miquel
Forcada Mikel
Haddow Barry
Heafield Kenneth
Hoang Hieu
Kamran Amir
Kirefu Faheem
Koehn Philipp
Ortiz-Rojas Sergio
Pla Leopoldo
Ramírez-Sánchez Gema
Sarrías Elsa
Strelec Marek
Thompson Brian
Waites William
Wiggins Dion
Zaragoza Jaume
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filtering. We also describe the parallel corpora released and evaluate their quality and their usefulness to create machine translation systems

Crossref

University of Strathclyde Institutional Repository

Edinburgh Research Explorer