228 research outputs found
On the Similarities Between Native, Non-native and Translated Texts
We present a computational analysis of three language varieties: native,
advanced non-native, and translation. Our goal is to investigate the
similarities and differences between non-native language productions and
translations, contrasting both with native language. Using a collection of
computational methods we establish three main results: (1) the three types of
texts are easily distinguishable; (2) non-native language and translations are
closer to each other than each of them is to native language; and (3) some of
these characteristics depend on the source or native language, while others do
not, reflecting, perhaps, unified principles that similarly affect translations
and non-native language.Comment: ACL2016, 12 page
Translationese and post-editese : how comparable is comparable quality?
Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as ‘Translationese’. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text
Automatic classification of human translation and machine translation : a study from the perspective of lexical diversity
By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.Publisher PD
On the differences between human translations
Many studies have confirmed that translated texts exhibit different features than texts originally written in the given language. This work explores texts translated by different translators taking into account expertise and native language.
A set of computational analyses was conducted on three language pairs, English-Croatian, German-French and English-Finnish, and the results show that each of the factors has certain influence on the features of the translated texts, especially on sentence length and lexical richness.
The results also indicate that for translations used for machine translation evaluation, it is important to specify these factors, especially when comparing machine translation quality with human translation quality
- …