11 research outputs found
Beyond Sentence-Level End-to-End Speech Translation: Context Helps
Document-level contextual information has shown benefits to text-based machine translation, but whether and how context helps end-to-end (E2E) speech translation (ST) is still under-studied. We fill this gap through extensive experiments using a simple concatenation-based context-aware ST model, paired with adaptive feature selection on speech encodings for computational efficiency. We investigate several decoding approaches, and introduce in-model ensemble decoding which jointly performs document- and sentence-level translation using the same model. Our results on the MuST-C benchmark with Transformer demonstrate the effectiveness of context to E2E ST. Compared to sentence-level ST, context-aware ST obtains better translation quality (+0.18-2.61 BLEU), improves pronoun and homophone translation, shows better robustness to (artificial) audio segmentation errors, and reduces latency and flicker to deliver higher quality for simultaneous translation
Arabic and English Relative Clauses and Machine Translation Challenges
The study aims at performing an error analysis as well as providing an evaluation of the quality of neural machine translation (NMT) represented by Google Translate when translating relative clauses. The study uses two test suites are composed of sentences that contain relative clauses. The first test suite composes of 108 pair sentences that are translated from English to Arabic whereas the second composes of 72 Arabic sentences that are translated into English. Errors annotation is performed by 6 professional annotators. The study presents a list of the annotated errors divided into accuracy and fluency errors that occur based on MQM. Manual evaluation is also performed by the six professionals along with a BLEU automatic evaluation using the Tilde Me platform. The results show that fluency errors are more frequent than accuracy errors. They also show that the frequency of errors and MT quality when translating from English into Arabic is lower than the frequency of errors and MT quality when translating from Arabic into English is also presented. Based on the performed error analysis and both manual and automatic evaluation, it is pointed out that the gap between MT and professional human translation is still large
DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues
Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of "human parity", since examining the quality at the level of the document rather than at the sentence level allows for the assessment of suprasentential context, providing a more reliable evaluation.
This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind
Linguistic evaluation of German-English Machine Translation using a Test Suite
We present the results of the application of a grammatical test suite for
GermanEnglish MT on the systems submitted at WMT19, with a
detailed analysis for 107 phenomena organized in 14 categories. The systems
still translate wrong one out of four test items in average. Low performance is
indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb
valency. When compared to last year, there has been a improvement of function
words, non-verbal agreement and punctuation. More detailed conclusions about
particular systems and phenomena are also presented
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
As machine translation (MT) metrics improve their correlation with human
judgement every year, it is crucial to understand the limitations of such
metrics at the segment level. Specifically, it is important to investigate
metric behaviour when facing accuracy errors in MT because these can have
dangerous consequences in certain contexts (e.g., legal, medical). We curate
ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging
from simple perturbations at the word/character level to more complex errors
based on discourse and real-world knowledge. We use ACES to evaluate a wide
range of MT metrics including the submissions to the WMT 2022 metrics shared
task and perform several analyses leading to general recommendations for metric
developers. We recommend: a) combining metrics with different strengths, b)
developing metrics that give more weight to the source and less to
surface-level overlap with the reference and c) explicitly modelling additional
language-specific information beyond what is available via multilingual
embeddings.Comment: preprint for WMT 202