644 research outputs found

    On the Similarities Between Native, Non-native and Translated Texts

    Full text link
    We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.Comment: ACL2016, 12 page

    Translationese and post-editese : how comparable is comparable quality?

    Get PDF
    Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as ‘Translationese’. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text

    Translationese in Japanese Literary Translation

    Get PDF
    Translationese in Japanese, despite its distinct characteristics when compared to natural Japanese, has so far been systematically studied by only one researcher (Furuno, 2005). In addition to this general lack of scholarly interest, the translational situations in Japan are not well-known in the West. In this paper, the notions of translationese in Japan are investigated from the perspective of Translation Studies and of Kokugogaku (studies of Japanese language). In addition, this study provides reasons for conducting systematic studies of translationese in Japan, where Translation Studies is still in its initial stages. Finally, the results of a preliminary examination of small comparable corpora using a translation and a non-translation are presented.La langue de traduction japonaise (translationese), malgré ses caractéristiques marquées qui la distinguent du japonais naturel, n’a été jusqu’ici étudiée de façon systématique que par un seul chercheur (Furuno, 2005). Outre le manque d’intérêt des universitaires pour cette langue, l’Occident ne connaît pas bien la situation traductionnelle du Japon. Dans cet article, nous nous proposons de nous pencher sur la notion de langue de traduction au Japon, en adoptant la perspective de la traductologie et de la Kokugogaku (l’étude de la langue japonaise) au Japon. Par ailleurs, cette étude propose des raisons de mener des analyses systématiques de la langue de traduction japonaise au Japon, pays où la traductologie n’est qu’à ses débuts. Nous terminerons en présentant les résultats d’un examen préliminaire de corpus restreints et comparables dans des cas de traduction et de non-traduction

    The Actres Project: Using corpora to assess English-Spanish translation

    Get PDF
    Assessing translation quality is generally seen as a difficult and elusive task because of the inadequacy of the tools available. The aim of this paper is to demonstrate the usefulness of a corpus-based contrastive methodology developed at the University of León (Spain) for identifying instances of translationese. The ACTRES project functional framework draws on the work by Bondarko (1991) and Chesterman (1998) and has been designed for translation-oriented cross-linguistic analysis (Rabadán et al. 2004). The long-term study focuses on those semantic areas that are typically problematic for our language pair (modality, quantification, modification, aspectuality, etc). The contrast features a two-step procedure: contrasting typical ways of expressing similar meanings in English and Spanish, and spotting differences between original Spanish and translated Spanish. First, empirical data are extracted from two monolingual ‘comparable’ corpora -The Bank of English and the Corpus de Referencia del Español Actual (CREA); secondly, these results are compared with data from a custom-made translation corpus containing English original texts and their corresponding Spanish translations. The three sets of results provide different types of useful information- i) the resources available (or absence of) in each of the languages to express a given meaning and their relative centrality, ii) the solutions favored by translators to bridge the crosslinguistic disparities and/or gaps and iii) the erroneous or non-existent uses and structures transferred from the source language into the target language. Translation practice, translator training and translation quality assessment (TQA) are the main areas that can benefit from this type of research

    Translating away Translationese without Parallel Data

    Full text link
    Translated texts exhibit systematic linguistic differences compared to original texts in the same language, and these differences are referred to as translationese. Translationese has effects on various cross-lingual natural language processing tasks, potentially leading to biased results. In this paper, we explore a novel approach to reduce translationese in translated texts: translation-based style transfer. As there are no parallel human-translated and original data in the same language, we use a self-supervised approach that can learn from comparable (rather than parallel) mono-lingual original and translated data. However, even this self-supervised approach requires some parallel data for validation. We show how we can eliminate the need for parallel validation data by combining the self-supervised loss with an unsupervised loss. This unsupervised loss leverages the original language model loss over the style-transferred output and a semantic similarity loss between the input and style-transferred output. We evaluate our approach in terms of original vs. translationese binary classification in addition to measuring content preservation and target-style fluency. The results show that our approach is able to reduce translationese classifier accuracy to a level of a random classifier after style transfer while adequately preserving the content and fluency in the target original style.Comment: Accepted at EMNLP 2023, Main Conferenc
    • …
    corecore