Search CORE

205 research outputs found

Findings of the 2019 Conference on Machine Translation (WMT19)

Author: Barrault Loïc
Bojar Ondřej
Costa-Jussà Marta R.
Federmann Christian
Fishel Mark
Graham Yvette
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

Irish Universities

DCU Online Research Access Service

Results of the WMT19 metrics shared task: segment-level and strong MT systems pose big challenges

Author: Bojar Ondřej
Graham Yvette
Ma Qingsong
Wei Johnny Tian-Zheng
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2019
Field of study

This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked to score the outputs of the translations systems competing in the WMT19 News Translation Task with automatic metrics. 13 research groups submitted 24 metrics, 10 of which are reference-less "metrics" and constitute submissions to the joint task with WMT19 Quality Estimation Task, "QE as a Metric". In addition, we computed 11 baseline metrics, with 8 commonly applied baselines (BLEU, SentBLEU, NIST, WER, PER, TER, CDER, and chrF) and 3 reimplementations (chrF+, sacreBLEU-BLEU, and sacreBLEU-chrF). Metrics were evaluated on the system level, how well a given metric correlates with the WMT19 official manual ranking, and segment level, how well the metric correlates with human judgements of segment quality. This year, we use direct assessment (DA) as our only form of manual evaluation

Irish Universities

DCU Online Research Access Service

IITP-MT System for Gujarati-English News Translation Task at WMT 2019

Author: Bhattacharyya Pushpak
Ekbal Asif
Gupta Kamal Kumar
Sen Sukanta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Parallel Corpus Filtering Based on Fuzzy String Matching

Author: Bhattacharyya Pushpak
Ekbal Asif
Sen Sukanta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Automatic Discrimination of Human and Neural Machine Translation:A Study with Multiple Pre-Trained Models and Longer Context

Author: Toral Antonio
van der Werff Tobias
van Noord Rik
Publication venue: European Association for Machine Translation
Publication date: 01/06/2022
Field of study

ARTS repository - University of Groningen

Automatic Discrimination of Human and Neural Machine Translation:A Study with Multiple Pre-Trained Models and Longer Context

Author: Toral Antonio
van der Werff Tobias
van Noord Rik
Publication venue: European Association for Machine Translation
Publication date: 01/06/2022
Field of study

We address the task of automatically distinguishing between human-translated (HT) and machine translated (MT) texts. Following recent work, we fine-tune pre-trained language models (LMs) to perform this task. Our work differs in that we use state-of-the-art pre-trained LMs, as well as the test sets of the WMT news shared tasks as training data, to ensure the sentences were not seen during training of the MT system itself. Moreover, we analyse performance for a number of different experimental setups, such as adding translationese data, going beyond the sentence-level and normalizing punctuation. We show that (i) choosing a state-of-the-art LM can make quite a difference: our best baseline system (DeBERTa) outperforms both BERT and RoBERTa by over 3% accuracy, (ii) adding translationese data is only beneficial if there is not much data available, (iii) considerable improvements can be obtained by classifying at the document-level and (iv) normalizing punctuation and thus avoiding (some) shortcuts has no impact on model performance

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen