644 research outputs found
On the Similarities Between Native, Non-native and Translated Texts
We present a computational analysis of three language varieties: native,
advanced non-native, and translation. Our goal is to investigate the
similarities and differences between non-native language productions and
translations, contrasting both with native language. Using a collection of
computational methods we establish three main results: (1) the three types of
texts are easily distinguishable; (2) non-native language and translations are
closer to each other than each of them is to native language; and (3) some of
these characteristics depend on the source or native language, while others do
not, reflecting, perhaps, unified principles that similarly affect translations
and non-native language.Comment: ACL2016, 12 page
Translationese and post-editese : how comparable is comparable quality?
Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as ‘Translationese’. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text
Translationese in Japanese Literary Translation
Translationese in Japanese, despite its distinct characteristics when compared to natural Japanese, has so far been systematically studied by only one researcher (Furuno, 2005). In addition to this general lack of scholarly interest, the translational situations in Japan are not well-known in the West. In this paper, the notions of translationese in Japan are investigated from the perspective of Translation Studies and of Kokugogaku (studies of Japanese language). In addition, this study provides reasons for conducting systematic studies of translationese in Japan, where Translation Studies is still in its initial stages. Finally, the results of a preliminary examination of small comparable corpora using a translation and a non-translation are presented.La langue de traduction japonaise (translationese), malgré ses caractéristiques marquées qui la distinguent du japonais naturel, n’a été jusqu’ici étudiée de façon systématique que par un seul chercheur (Furuno, 2005). Outre le manque d’intérêt des universitaires pour cette langue, l’Occident ne connaît pas bien la situation traductionnelle du Japon. Dans cet article, nous nous proposons de nous pencher sur la notion de langue de traduction au Japon, en adoptant la perspective de la traductologie et de la Kokugogaku (l’étude de la langue japonaise) au Japon. Par ailleurs, cette étude propose des raisons de mener des analyses systématiques de la langue de traduction japonaise au Japon, pays où la traductologie n’est qu’à ses débuts. Nous terminerons en présentant les résultats d’un examen préliminaire de corpus restreints et comparables dans des cas de traduction et de non-traduction
The Actres Project: Using corpora to assess English-Spanish translation
Assessing translation quality is generally seen as a difficult and elusive task because of the inadequacy of the tools available. The aim of this paper is to demonstrate the usefulness of a corpus-based contrastive methodology developed at the University of León (Spain) for identifying instances of translationese. The ACTRES project functional framework draws on the work by Bondarko (1991) and Chesterman (1998) and has been designed for translation-oriented cross-linguistic analysis (Rabadán et al. 2004). The long-term study focuses on those semantic areas that are typically problematic for our language pair (modality, quantification, modification, aspectuality, etc). The contrast features a two-step procedure: contrasting typical ways of expressing similar meanings in English and Spanish, and spotting differences between original Spanish and translated Spanish. First, empirical data are extracted from two monolingual ‘comparable’ corpora -The Bank of English and the Corpus de Referencia del Español Actual (CREA); secondly, these results are compared with data from a custom-made translation corpus containing English original texts and their corresponding Spanish translations. The three sets of results provide different types of useful information- i) the resources available (or absence of) in each of the languages to express a given meaning and their relative centrality, ii) the solutions favored by translators to bridge the crosslinguistic disparities and/or gaps and iii) the erroneous or non-existent uses and structures transferred from the source language into the target language. Translation practice, translator training and translation quality assessment (TQA) are the main areas that can benefit from this type of research
Translating away Translationese without Parallel Data
Translated texts exhibit systematic linguistic differences compared to
original texts in the same language, and these differences are referred to as
translationese. Translationese has effects on various cross-lingual natural
language processing tasks, potentially leading to biased results. In this
paper, we explore a novel approach to reduce translationese in translated
texts: translation-based style transfer. As there are no parallel
human-translated and original data in the same language, we use a
self-supervised approach that can learn from comparable (rather than parallel)
mono-lingual original and translated data. However, even this self-supervised
approach requires some parallel data for validation. We show how we can
eliminate the need for parallel validation data by combining the
self-supervised loss with an unsupervised loss. This unsupervised loss
leverages the original language model loss over the style-transferred output
and a semantic similarity loss between the input and style-transferred output.
We evaluate our approach in terms of original vs. translationese binary
classification in addition to measuring content preservation and target-style
fluency. The results show that our approach is able to reduce translationese
classifier accuracy to a level of a random classifier after style transfer
while adequately preserving the content and fluency in the target original
style.Comment: Accepted at EMNLP 2023, Main Conferenc
- …