9 research outputs found

    Translationese and post-editese : how comparable is comparable quality?

    Get PDF
    Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as ‘Translationese’. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text

    Post Editese als verschÀrfte Form der Translationese?: Eine Korpusanalyse zu Simplification und Interference in posteditierten Texten

    Get PDF
    Mit Versprechungen von teils enormen ProduktivitĂ€tssteigerungen bei mindestens gleicher QualitĂ€t im Vergleich zu HumanĂŒbersetzungen hat Post-Editing ĂŒber die Jahre zunehmend an Relevanz gewonnen und ist heute aus der Übersetzungslandschaft nicht mehr wegzudenken. Neben zahllosen QualitĂ€ts- und ProduktivitĂ€tsstudien gab es vereinzelt BemĂŒhungen, mögliche linguistische Unterschiede zwischen posteditierten und humanĂŒbersetzten Texten („Post-Editese“ nach Daems et al 2017) zu untersuchen. So stellte Toral (2019) fest, dass posteditierte Texte bestimmte Merkmale ĂŒbersetzter Texte („Translationese“ nach Gellerstam, 1986), nicht nur aufweisen, sondern dass diese signifikant hĂ€ufiger in posteditierten Texten vorzufinden seien. Insofern sei Post-Editese eine verschĂ€rfte Form der Translationese. Allerdings hat das Forschungsfeld der Post-Editese vergleichsweise wenig Aufmerksamkeit erfahren und in den meisten Korpusanalysen wurden Zeitungsartikel oder Texte aus dem Bereich general language untersucht, obwohl sich ein Großteil des Übersetzungsbedarfs im Bereich der Fachtexte und Language for Specific Purposes verorten lĂ€sst. Um dieses Desiderat zu bearbeiten, wird an die Vorarbeit von Toral angeknĂŒpft und anhand eines Fachtextkorpus untersucht, ob sich seine These fĂŒr die Translationese-Merkmale Simplification und Interference anhand von Ausschnitten aus technischen und medizinischen Fachtexten und den dazugehörigen Übersetzungen bestĂ€tigen lĂ€sst. Das Korpus besteht aus 75 Korpustexten des Sprachenpaars Englisch-Deutsch, darunter 6 Ausgangstexte und 69 Übersetzungen der Modi MÜ, LPE, FPE und HÜ. Die Methodik von Toral (2019) und Lapshinova-Koltunski (2013) wird in adaptierter Form und unter Zuhilfenahme der Sprachanalyse-Software Sketch Engine angewendet. Simplification soll anhand der lexikalischen Dichte und lexikalischen DiversitĂ€t (Type-Token-Ratio, TTR) nachgewiesen werden. Das VerhĂ€ltnis NominalitĂ€t/VerbalitĂ€t und das SatzlĂ€ngenverhĂ€ltnis zwischen AT und ZT werden zum Nachweis von Interference herangezogen. Die Ergebnisse lassen weder zu Simplification noch zu Interference eindeutige SchlĂŒsse zu: WĂ€hrend der TTR in posteditierten Texten wie erwartet geringer ist als in humanĂŒbersetzten Texten, d. h. die posteditierten Texte weniger lexikalisch divers sind als die humanĂŒbersetzten, ist die lexikalische Dichte in den posteditierten Texten entgegen der Hypothese höher als in den humanĂŒbersetzten Texten. Die SatzlĂ€ngenverhĂ€ltnisse zwischen AT und ZT sind in den posteditierten Texten tatsĂ€chlich Ă€hnlicher als bei den HumanĂŒbersetzungen, was entsprechend der aufgestellten Hypothese auf mehr Interference des AT in den posteditierten Texten hindeutet. Die Ergebnisse zu den VerhĂ€ltnissen von NominalitĂ€t/VerbalitĂ€t jedoch konnten aufgrund von mutmaßlichen Verzerrungen durch Sprachsystemunterschiede im Bereich der Kompositabildung nicht sinnvoll ausgewertet werden. Durch die geringe KorpusgrĂ¶ĂŸe und die geringe LĂ€nge der Korpustexte wirken sich bereits einzelne idiosynkratische Entscheidungen auf Einzeltextebene, etwa bei der Satzsegmentierung oder bei der Terminologie, stark auf das Gesamtergebnis aus. Insofern zeigt sich, dass Besonderheiten auf Einzeltextebene in kĂŒnftigen quantitativen Korpusanalysen, insbesondere in solchen mit grĂ¶ĂŸeren Korpora, stĂ€rker berĂŒcksichtigt werden sollten. Torals These, Post-Editese sei eine verschĂ€rfte Form der Translationese, konnte im Rahmen dieser Arbeit folglich nicht eindeutig bestĂ€tigt werden.:1 Einleitung 1.1 Herleitung 1.2 Ziel der Arbeit 1.3 Aufbau der Arbeit 2 Hintergrund 2.1 Post-Editing 2.1.1 Definition 2.1.2 Arten von Post-Editing 2.1.3 ProduktivitĂ€ts- und Kostenvorteile 2.1.4 QualitĂ€tsanalysen 2.1.5 Einflussfaktoren auf die QualitĂ€t des MÜ-Outputs 2.1.6 PrĂ€ferenz- und Akzeptanzstudien 2.1.7 Einstellungen von Übersetzern, Sprachdienstleistern und Kunden 2.1.8 Weitere Forschungsrichtungen 2.2 Translationese 2.2.1 Definition 2.2.2 Übersetzungsuniversalien nach Baker 2.2.3 Interference nach Toury bzw. Teich 2.2.4 Forschungsstand 2.2.5 Neuere Forschungsrichtungen 2.3 Post-Editese 2.3.1 Definition 2.3.2 Forschungsstand 2.4 Zwischenfazit und Forschungsdesiderat 3 Daten und Methoden 3.1 Korpus 3.1.1 Beschreibung der Korpustexte 3.1.2 Erstellung des Korpus 3.1.3 BegrĂŒndung der Auswahl des Korpus 3.1.4 Aufbereitung der Korpustexte fĂŒr die maschinelle Auswertung 3.2 Analysemethoden 3.2.1 Simplification 3.2.2 Interference 4 Ergebnisse der eigenen Korpusanalyse und Einordnung der Ergebnisse 4.1 Simplification 4.1.1 Lexikalische DiversitĂ€t 4.1.2 Lexikalische Dichte 4.1.3 Einordnung der Ergebnisse zu Simplification 4.2 Interference 4.2.1 VerhĂ€ltnis der VerhĂ€ltnisse von nominalen und verbalen Wortarten zwischen AT und ZT 4.2.2 VerhĂ€ltnis der SatzlĂ€ngen zwischen AT und ZT 4.2.3 Einordnung der Ergebnisse zu Interference 5 Diskussion 6 Fazit und Ausblick 7 LiteraturverzeichnisPromising moderate to sharp increases in productivity while achieving quality that is at least on par with human translation, post-editing has increasingly gained in importance in recent years and has become an integral part of the translation landscape. Aside from countless studies on productivity and quality, there have been isolated efforts to examine possible linguistic differences between postedited texts and human translations (“post-editese” according to Daems et al, 2017). Toral (2019) observed that postedited texts not only exhibit certain characteristics of translated texts (“translationese” according to Gellerstam, 1986), but that they exhibit them to a significantly higher degree compared with human translations. This prompted Toral to describe post-editese as an “exacerbated translationese”. However, research into post-editese has received relatively little attention and most corpus studies have focused on newspaper articles or so-called general language, despite the fact that most of professional translation takes place in the domain of Language for Specific Purposes. In order to address this desideratum, this work builds on Toral’s research. A corpus of technical texts is used to examine whether his thesis can be confirmed for simplification and interference, two well-known translationese characteristics, when relying on text excerpts from the technical and medical domain and their corresponding translations. The corpus consists of 75 texts in total, made up of 6 English source texts and 69 corresponding German translations produced via MT, LPE, FPE and HT. The analysis relies on an adapted form of Toral’s (2019) and Lapshinova-Koltunski’s (2013) methods and makes use of the text analysis software Sketch Engine. The parameters lexical density and lexical variety (type-token ratio, TTR) are used to identify simplification, whereas the nominality/verbality ratio and sentence length ratio between source and target text are interpreted as evidence for interference. The results are inconclusive for both simplification and interference. While the TTR is, as per the hypothesis, lower in the postedited texts, i. e. while the postedited texts are less lexically varied compared with the human translations, the lexical density of the postedited texts is higher than that of the human translations, which contradicts the hypothesis. The sentence length ratios between source and target texts are more similar in the postedited texts compared with human translations, confirming the hypothesis that postedited texts show more interference from their source text. The results for the nominality/verbality ratio, however, could not be interpreted in any meaningful way, as distortions due to differences in the language systems with respect to compounding likely play a role. Given the small corpus size and the short length of the corpus texts, even isolated, idiosyncratic decisions at the individual text level, e.g. in regards to sentence segmentation and terminology seem to have a major impact on the total result. This shows that peculiarities at the individual text level need to be considered more carefully in future quantitative corpus studies, in particular when dealing with larger corpora. Consequently, Toral’s thesis of post-editese being an exacerbated form of translationese could not definitively be confirmed within this work.:1 Einleitung 1.1 Herleitung 1.2 Ziel der Arbeit 1.3 Aufbau der Arbeit 2 Hintergrund 2.1 Post-Editing 2.1.1 Definition 2.1.2 Arten von Post-Editing 2.1.3 ProduktivitĂ€ts- und Kostenvorteile 2.1.4 QualitĂ€tsanalysen 2.1.5 Einflussfaktoren auf die QualitĂ€t des MÜ-Outputs 2.1.6 PrĂ€ferenz- und Akzeptanzstudien 2.1.7 Einstellungen von Übersetzern, Sprachdienstleistern und Kunden 2.1.8 Weitere Forschungsrichtungen 2.2 Translationese 2.2.1 Definition 2.2.2 Übersetzungsuniversalien nach Baker 2.2.3 Interference nach Toury bzw. Teich 2.2.4 Forschungsstand 2.2.5 Neuere Forschungsrichtungen 2.3 Post-Editese 2.3.1 Definition 2.3.2 Forschungsstand 2.4 Zwischenfazit und Forschungsdesiderat 3 Daten und Methoden 3.1 Korpus 3.1.1 Beschreibung der Korpustexte 3.1.2 Erstellung des Korpus 3.1.3 BegrĂŒndung der Auswahl des Korpus 3.1.4 Aufbereitung der Korpustexte fĂŒr die maschinelle Auswertung 3.2 Analysemethoden 3.2.1 Simplification 3.2.2 Interference 4 Ergebnisse der eigenen Korpusanalyse und Einordnung der Ergebnisse 4.1 Simplification 4.1.1 Lexikalische DiversitĂ€t 4.1.2 Lexikalische Dichte 4.1.3 Einordnung der Ergebnisse zu Simplification 4.2 Interference 4.2.1 VerhĂ€ltnis der VerhĂ€ltnisse von nominalen und verbalen Wortarten zwischen AT und ZT 4.2.2 VerhĂ€ltnis der SatzlĂ€ngen zwischen AT und ZT 4.2.3 Einordnung der Ergebnisse zu Interference 5 Diskussion 6 Fazit und Ausblick 7 Literaturverzeichni

    Retranslating 1984: the effects of linguistic and cultural changes on three Italian translations of the English literary classic

    Get PDF
    A piĂč di settant’anni dalla pubblicazione di 1984, il classico della letteratura inglese torna protagonista nelle librerie italiane con una serie di nuove traduzioni. Complice la scadenza nel 2021 dei termini di copyright, versioni aggiornate di quella che puĂČ definirsi come la fantasia distopica per eccellenza testimoniano come il romanzo orwelliano conservi un’attualitĂ  tale da legittimare una sua rivisitazione in chiave moderna. L’obiettivo di questo studio Ăš quello di indagare il fenomeno della ritraduzione, ovvero la realizzazione di una nuova traduzione di un testo nella stessa lingua di arrivo in cui era stato giĂ  precedentemente tradotto, nel tentativo di stabilire in che misura interpretazioni successive a traduzioni esistenti riflettono i cambiamenti nel frattempo manifestatisi nel sistema linguistico e culturale di destinazione. A tal fine, Ăš compresa nello studio una comparazione di tre diverse traduzioni italiane del classico inglese, pubblicate rispettivamente nel 1950, 2000 e 2021. L’analisi Ăš condotta in relazione ad alcune delle piĂč influenti riflessioni teoriche riguardanti la pratica (ri)traduttiva, con uno sguardo alle strategie adottate per riprodurre le caratteristiche lessicali, sintattiche e stilistiche dell’originale in base alle sempre mutevoli norme del sistema letterario italiano. I risultati di tale analisi indicano che la ritraduzione letteraria Ăš da intendersi capace di favorire il processo di ri-standardizzazione in corso nella lingua di arrivo. Tuttavia, l’introduzione di eventuali cambiamenti di tipo linguistico e culturale nell’italiano delle traduzioni rappresenta una delle tante ragioni dietro alla proposta di traduzioni alternative del capolavoro di George Orwell, contribuendo cosĂŹ a contrastare la visione tuttora prevalente della ritraduzione come un fenomeno esclusivamente correttivo.More than seventy years after the publication of Nineteen Eighty-Four, a wave of new translations of the classic work of English literature captures the Italian publishing market. As a consequence of the copyright protection coming to an end in 2021, refreshed versions of what can be referred to as the ultimate dystopian fantasy testify to the fact that the contemporary relevance of the novel justifies making more up-to-date reinterpretations of Orwell’s book. The goal of the study is to investigate the phenomenon of retranslation, that is the act of translating a work that has previously been translated into the same language, in an attempt to establish the extent to which alternative renditions subsequent to extant translations reflect linguistic and cultural changes occurring in the receiving system. For this purpose, a comparison of three different Italian translations of the English classic, published in 1950, 2000 and 2021, is conducted. The analysis takes into account some of the most influential theoretical writings regarding the activity of retranslation, focusing on the different strategies implemented in order to adapt the lexical, syntactic and stylistic characteristics of the original to the ever-changing translation norms of the Italian literary system. According to the present study, literary retranslation is to be understood as capable of influencing the process of re-standardization taking place in the target language. Tellingly, innovations introduced in the Italian language of translations are one of the many reasons behind the production of alternative translations of George Orwell’s literary masterpiece, thus contributing to challenging the prevailing view of retranslation as an exclusively restorative phenomenon

    Translationese indicators for human translation quality estimation (based on English-to-Russian translation of mass-media texts)

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Human translation quality estimation is a relatively new and challenging area of research, because human translation quality is notoriously more subtle and subjective than machine translation, which attracts much more attention and effort of the research community. At the same time, human translation is routinely assessed by education and certification institutions, as well as at translation competitions. Do the quality labels and scores generated from real-life quality judgments align well with objective properties of translations? This thesis puts this question to a test using machine learning methods. Conceptually, this research is built around a hypothesis that linguistic properties characteristic of translations, as a specific form of communication, can correlate with translation quality. This assumption is often made in translation studies but has never been put to a rigorous empirical test. Exploring translationese features in a quality estimation task can help identify quality-related trends in translational behaviour and provide data-driven insights into professionalism to improve training. Using translationese for quality estimation fits well with the concept of quality in translation studies, because it is essentially a document-level property. Linguistically-motivated translationese features are also more interpretable than popular distributed representations and can explain linguistic differences between quality categories in human translation. We investigated (i) an extended set of Universal Dependencies-based morphosyntactic features as well as two lexical feature sets capturing (ii) collocational properties of translations, and (iii) ratios of vocabulary items in various frequency bands along with entropy scores from n-gram models. To compare the performance of our feature sets in translationese classifications and in quality estimation tasks against other representations, the experiments were also run on tf-idf features, QuEst++ features and on contextualised embeddings from a range of pre-trained language models, including the state-of-the-art multilingual solution for machine translation quality estimation. Our major focus was on document-level prediction, however, where the labels and features allowed, the experiments were extended to the sentence level. The corpus used in this research includes English-to-Russian parallel subcorpora of student and professional translations of mass-media texts, and a register-comparable corpus of non-translations in the target language. Quality labels for various subsets of student translations come from a number of real-life settings: translation competitions, graded student translations, error annotations and direct assessment. We overview approaches to benchmarking quality in translation and provide a detailed description of our own annotation experiments. Of the three proposed translationese feature sets, morphosyntactic features, returned the best results on all tasks. In many settings they were secondary only to contextualised embeddings. At the same time, performance on various representations was contingent on the type of quality captured by quality labels/scores. Using the outcomes of machine learning experiments and feature analysis, we established that translationese properties of translations were not equality reflected by various labels and scores. For example, professionalism was much less related to translationese than expected. Labels from documentlevel holistic assessment demonstrated maximum support for our hypothesis: lower-ranking translations clearly exhibited more translationese. They bore more traces of mechanical translational behaviours associated with following source language patterns whenever possible, which led to the inflated frequencies of analytical passives, modal predicates, verbal forms, especially copula verbs and verbs in the finite form. As expected, lower-ranking translations were more repetitive and had longer, more complex sentences. Higher-ranking translations were indicative of greater skill in recognising and counteracting translationese tendencies. For document-level holistic labels as an approach to capture quality, translationese indicators might provide a valuable contribution to an effective quality estimation pipeline. However, error-based scores, and especially scores from sentence-level direct assessment, proved to be much less correlated by translationese and fluency issues, in general. This was confirmed by relatively low regression results across all representations that had access only to the target language side of the dataset, by feature analysis and by correlation between error-based scores and scores from direct assessment

    Unsupervised Identification of Translationese

    No full text
    corecore