Search CORE

11 research outputs found

A Pronoun Test Suite Evaluation of the English--German MT Systems at WMT 2018

Author: Guillou Liane
Hardmeier Christian
Lapshinova-Koltunski Ekaterina
Loáiciga Sharid
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Crossref

Edinburgh Research Explorer

Beyond Sentence-Level End-to-End Speech Translation: Context Helps

Author: Haddow Barry
Sennrich Rico
Titov Ivan
Zhang Biao
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Document-level contextual information has shown benefits to text-based machine translation, but whether and how context helps end-to-end (E2E) speech translation (ST) is still under-studied. We fill this gap through extensive experiments using a simple concatenation-based context-aware ST model, paired with adaptive feature selection on speech encodings for computational efficiency. We investigate several decoding approaches, and introduce in-model ensemble decoding which jointly performs document- and sentence-level translation using the same model. Our results on the MuST-C benchmark with Transformer demonstrate the effectiveness of context to E2E ST. Compared to sentence-level ST, context-aware ST obtains better translation quality (+0.18-2.61 BLEU), improves pronoun and homophone translation, shows better robustness to (artificial) audio segmentation errors, and reduces latency and flicker to deliver higher quality for simultaneous translation

Edinburgh Research Explorer

ZORA

UvA-DARE

Arabic and English Relative Clauses and Machine Translation Challenges

Author: Khalil A. Nagi
Publication venue: University of Science and Technology, Yemen
Publication date: 01/10/2023
Field of study

The study aims at performing an error analysis as well as providing an evaluation of the quality of neural machine translation (NMT) represented by Google Translate when translating relative clauses. The study uses two test suites are composed of sentences that contain relative clauses. The first test suite composes of 108 pair sentences that are translated from English to Arabic whereas the second composes of 72 Arabic sentences that are translated into English. Errors annotation is performed by 6 professional annotators. The study presents a list of the annotated errors divided into accuracy and fluency errors that occur based on MQM. Manual evaluation is also performed by the six professionals along with a BLEU automatic evaluation using the Tilde Me platform. The results show that fluency errors are more frequent than accuracy errors. They also show that the frequency of errors and MT quality when translating from English into Arabic is lower than the frequency of errors and MT quality when translating from Arabic into English is also presented. Based on the performed error analysis and both manual and automatic evaluation, it is pointed out that the gap between MT and professional human translation is still large

Directory of Open Access Journals

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues

Author: Castilho Sheila
Cavalheiro Camargo João Lucas
Menezes Miguel
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 10/11/2021
Field of study

Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of "human parity", since examining the quality at the level of the document rather than at the sentence level allows for the assessment of suprasentential context, providing a more reliable evaluation. This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind

DCU Online Research Access Service

Linguistic evaluation of German-English Machine Translation using a Test Suite

Author: Avramidis Eleftherios
Macketanz Vivien
Strohriegel Ursula
Uszkoreit Hans
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

We present the results of the application of a grammatical test suite for German

\rightarrow

English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been a improvement of function words, non-verbal agreement and punctuation. More detailed conclusions about particular systems and phenomena are also presented

arXiv.org e-Print Archive

Crossref

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

Author: Amrhein Chantal
Guillou Liane
Moghe Nikita
Publication venue
Publication date: 27/10/2022
Field of study

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.Comment: preprint for WMT 202

arXiv.org e-Print Archive

ZORA