10,152 research outputs found
Linguistic evaluation of German-English Machine Translation using a Test Suite
We present the results of the application of a grammatical test suite for
GermanEnglish MT on the systems submitted at WMT19, with a
detailed analysis for 107 phenomena organized in 14 categories. The systems
still translate wrong one out of four test items in average. Low performance is
indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb
valency. When compared to last year, there has been a improvement of function
words, non-verbal agreement and punctuation. More detailed conclusions about
particular systems and phenomena are also presented
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach
This paper defines a method for lexicon in the biomedical domain from
comparable corpora. The method is based on compositional translation and
exploits morpheme-level translation equivalences. It can generate translations
for a large variety of morphologically constructed words and can also generate
'fertile' translations. We show that fertile translations increase the overall
quality of the extracted lexicon for English to French translation
TermEval: an automatic metric for evaluating terminology translation in MT
Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain-knowledge from source to target is arguably the most concerning factor for the customers in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. However, evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem,
which could aid the end-users to instantly identify term translation problems in MT.
In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold standard evaluation test set, we semi-automatically create a gold-standard dataset from English--Hindi judicial domain parallel corpus.
We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations
Are ambiguous conjunctions problematic for machine translation?
The translation of ambiguous words still poses challenges for machine translation.
In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions âbutâ and âandâ. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction âbutâ on 20 translation outputs, and the conjunction âandâ on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction
âbutâ. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for âbutâ and from 20% to 57% for âandâ. The major error for all systems is replacing the correct target variant with the opposite one
- âŠ