9 research outputs found
Translationese and post-editese : how comparable is comparable quality?
Whereas post-edited texts have been shown to be either of comparable quality to human translations or better, one study shows that people still seem to prefer human-translated texts. The idea of texts being inherently different despite being of high quality is not new. Translated texts, for example,are also different from original texts, a phenomenon referred to as âTranslationeseâ. Research into Translationese has shown that, whereas humans cannot distinguish between translated and original text,computers have been trained to detect Translationesesuccessfully. It remains to be seen whether the same can be done for what we call Post-editese. We first establish whether humans are capable of distinguishing post-edited texts from human translations, and then establish whether it is possible to build a supervised machine-learning model that can distinguish between translated and post-edited text
Post Editese als verschÀrfte Form der Translationese?: Eine Korpusanalyse zu Simplification und Interference in posteditierten Texten
Mit Versprechungen von teils enormen ProduktivitĂ€tssteigerungen bei mindestens gleicher QualitĂ€t im Vergleich zu HumanĂŒbersetzungen hat Post-Editing ĂŒber die Jahre zunehmend an Relevanz gewonnen und ist heute aus der Ăbersetzungslandschaft nicht mehr wegzudenken. Neben zahllosen QualitĂ€ts- und ProduktivitĂ€tsstudien gab es vereinzelt BemĂŒhungen, mögliche linguistische Unterschiede zwischen posteditierten und humanĂŒbersetzten Texten (âPost-Editeseâ nach Daems et al 2017) zu untersuchen. So stellte Toral (2019) fest, dass posteditierte Texte bestimmte Merkmale ĂŒbersetzter Texte (âTranslationeseâ nach Gellerstam, 1986), nicht nur aufweisen, sondern dass diese signifikant hĂ€ufiger in posteditierten Texten vorzufinden seien. Insofern sei Post-Editese eine verschĂ€rfte Form der Translationese.
Allerdings hat das Forschungsfeld der Post-Editese vergleichsweise wenig Aufmerksamkeit erfahren und in den meisten Korpusanalysen wurden Zeitungsartikel oder Texte aus dem Bereich general language untersucht, obwohl sich ein GroĂteil des Ăbersetzungsbedarfs im Bereich der Fachtexte und Language for Specific Purposes verorten lĂ€sst.
Um dieses Desiderat zu bearbeiten, wird an die Vorarbeit von Toral angeknĂŒpft und anhand eines Fachtextkorpus untersucht, ob sich seine These fĂŒr die Translationese-Merkmale Simplification und Interference anhand von Ausschnitten aus technischen und medizinischen Fachtexten und den dazugehörigen Ăbersetzungen bestĂ€tigen lĂ€sst.
Das Korpus besteht aus 75 Korpustexten des Sprachenpaars Englisch-Deutsch, darunter 6 Ausgangstexte und 69 Ăbersetzungen der Modi MĂ, LPE, FPE und HĂ. Die Methodik von Toral (2019) und Lapshinova-Koltunski (2013) wird in adaptierter Form und unter Zuhilfenahme der Sprachanalyse-Software Sketch Engine angewendet. Simplification soll anhand der lexikalischen Dichte und lexikalischen DiversitĂ€t (Type-Token-Ratio, TTR) nachgewiesen werden. Das VerhĂ€ltnis NominalitĂ€t/VerbalitĂ€t und das SatzlĂ€ngenverhĂ€ltnis zwischen AT und ZT werden zum Nachweis von Interference herangezogen.
Die Ergebnisse lassen weder zu Simplification noch zu Interference eindeutige SchlĂŒsse zu: WĂ€hrend der TTR in posteditierten Texten wie erwartet geringer ist als in humanĂŒbersetzten Texten, d. h. die posteditierten Texte weniger lexikalisch divers sind als die humanĂŒbersetzten, ist die lexikalische Dichte in den posteditierten Texten entgegen der Hypothese höher als in den humanĂŒbersetzten Texten. Die SatzlĂ€ngenverhĂ€ltnisse zwischen AT und ZT sind in den posteditierten Texten tatsĂ€chlich Ă€hnlicher als bei den HumanĂŒbersetzungen, was entsprechend der aufgestellten Hypothese auf mehr Interference des AT in den posteditierten Texten hindeutet. Die Ergebnisse zu den VerhĂ€ltnissen von NominalitĂ€t/VerbalitĂ€t jedoch konnten aufgrund von mutmaĂlichen Verzerrungen durch Sprachsystemunterschiede im Bereich der Kompositabildung nicht sinnvoll ausgewertet werden. Durch die geringe KorpusgröĂe und die geringe LĂ€nge der Korpustexte wirken sich bereits einzelne idiosynkratische Entscheidungen auf Einzeltextebene, etwa bei der Satzsegmentierung oder bei der Terminologie, stark auf das Gesamtergebnis aus. Insofern zeigt sich, dass Besonderheiten auf Einzeltextebene in kĂŒnftigen quantitativen Korpusanalysen, insbesondere in solchen mit gröĂeren Korpora, stĂ€rker berĂŒcksichtigt werden sollten. Torals These, Post-Editese sei eine verschĂ€rfte Form der Translationese, konnte im Rahmen dieser Arbeit folglich nicht eindeutig bestĂ€tigt werden.:1 Einleitung
1.1 Herleitung
1.2 Ziel der Arbeit
1.3 Aufbau der Arbeit
2 Hintergrund
2.1 Post-Editing
2.1.1 Definition
2.1.2 Arten von Post-Editing
2.1.3 ProduktivitÀts- und Kostenvorteile
2.1.4 QualitÀtsanalysen
2.1.5 Einflussfaktoren auf die QualitĂ€t des MĂ-Outputs
2.1.6 PrÀferenz- und Akzeptanzstudien
2.1.7 Einstellungen von Ăbersetzern, Sprachdienstleistern und Kunden
2.1.8 Weitere Forschungsrichtungen
2.2 Translationese
2.2.1 Definition
2.2.2 Ăbersetzungsuniversalien nach Baker
2.2.3 Interference nach Toury bzw. Teich
2.2.4 Forschungsstand
2.2.5 Neuere Forschungsrichtungen
2.3 Post-Editese
2.3.1 Definition
2.3.2 Forschungsstand
2.4 Zwischenfazit und Forschungsdesiderat
3 Daten und Methoden
3.1 Korpus
3.1.1 Beschreibung der Korpustexte
3.1.2 Erstellung des Korpus
3.1.3 BegrĂŒndung der Auswahl des Korpus
3.1.4 Aufbereitung der Korpustexte fĂŒr die maschinelle Auswertung
3.2 Analysemethoden
3.2.1 Simplification
3.2.2 Interference
4 Ergebnisse der eigenen Korpusanalyse und Einordnung der Ergebnisse
4.1 Simplification
4.1.1 Lexikalische DiversitÀt
4.1.2 Lexikalische Dichte
4.1.3 Einordnung der Ergebnisse zu Simplification
4.2 Interference
4.2.1 VerhÀltnis der VerhÀltnisse von nominalen und verbalen Wortarten zwischen AT und ZT
4.2.2 VerhÀltnis der SatzlÀngen zwischen AT und ZT
4.2.3 Einordnung der Ergebnisse zu Interference
5 Diskussion
6 Fazit und Ausblick
7 LiteraturverzeichnisPromising moderate to sharp increases in productivity while achieving quality that is at least on par with human translation, post-editing has increasingly gained in importance in recent years and has become an integral part of the translation landscape. Aside from countless studies on productivity and quality, there have been isolated efforts to examine possible linguistic differences between postedited texts and human translations (âpost-editeseâ according to Daems et al, 2017). Toral (2019) observed that postedited texts not only exhibit certain characteristics of translated texts (âtranslationeseâ according to Gellerstam, 1986), but that they exhibit them to a significantly higher degree compared with human translations. This prompted Toral to describe post-editese as an âexacerbated translationeseâ.
However, research into post-editese has received relatively little attention and most corpus studies have focused on newspaper articles or so-called general language, despite the fact that most of professional translation takes place in the domain of Language for Specific Purposes.
In order to address this desideratum, this work builds on Toralâs research. A corpus of technical texts is used to examine whether his thesis can be confirmed for simplification and interference, two well-known translationese characteristics, when relying on text excerpts from the technical and medical domain and their corresponding translations.
The corpus consists of 75 texts in total, made up of 6 English source texts and 69 corresponding German translations produced via MT, LPE, FPE and HT. The analysis relies on an adapted form of Toralâs (2019) and Lapshinova-Koltunskiâs (2013) methods and makes use of the text analysis software Sketch Engine. The parameters lexical density and lexical variety (type-token ratio, TTR) are used to identify simplification, whereas the nominality/verbality ratio and sentence length ratio between source and target text are interpreted as evidence for interference.
The results are inconclusive for both simplification and interference. While the TTR is, as per the hypothesis, lower in the postedited texts, i. e. while the postedited texts are less lexically varied compared with the human translations, the lexical density of the postedited texts is higher than that of the human translations, which contradicts the hypothesis. The sentence length ratios between source and target texts are more similar in the postedited texts compared with human translations, confirming the hypothesis that postedited texts show more interference from their source text. The results for the nominality/verbality ratio, however, could not be interpreted in any meaningful way, as distortions due to differences in the language systems with respect to compounding likely play a role. Given the small corpus size and the short length of the corpus texts, even isolated, idiosyncratic decisions at the individual text level, e.g. in regards to sentence segmentation and terminology seem to have a major impact on the total result. This shows that peculiarities at the individual text level need to be considered more carefully in future quantitative corpus studies, in particular when dealing with larger corpora. Consequently, Toralâs thesis of post-editese being an exacerbated form of translationese could not definitively be confirmed within this work.:1 Einleitung
1.1 Herleitung
1.2 Ziel der Arbeit
1.3 Aufbau der Arbeit
2 Hintergrund
2.1 Post-Editing
2.1.1 Definition
2.1.2 Arten von Post-Editing
2.1.3 ProduktivitÀts- und Kostenvorteile
2.1.4 QualitÀtsanalysen
2.1.5 Einflussfaktoren auf die QualitĂ€t des MĂ-Outputs
2.1.6 PrÀferenz- und Akzeptanzstudien
2.1.7 Einstellungen von Ăbersetzern, Sprachdienstleistern und Kunden
2.1.8 Weitere Forschungsrichtungen
2.2 Translationese
2.2.1 Definition
2.2.2 Ăbersetzungsuniversalien nach Baker
2.2.3 Interference nach Toury bzw. Teich
2.2.4 Forschungsstand
2.2.5 Neuere Forschungsrichtungen
2.3 Post-Editese
2.3.1 Definition
2.3.2 Forschungsstand
2.4 Zwischenfazit und Forschungsdesiderat
3 Daten und Methoden
3.1 Korpus
3.1.1 Beschreibung der Korpustexte
3.1.2 Erstellung des Korpus
3.1.3 BegrĂŒndung der Auswahl des Korpus
3.1.4 Aufbereitung der Korpustexte fĂŒr die maschinelle Auswertung
3.2 Analysemethoden
3.2.1 Simplification
3.2.2 Interference
4 Ergebnisse der eigenen Korpusanalyse und Einordnung der Ergebnisse
4.1 Simplification
4.1.1 Lexikalische DiversitÀt
4.1.2 Lexikalische Dichte
4.1.3 Einordnung der Ergebnisse zu Simplification
4.2 Interference
4.2.1 VerhÀltnis der VerhÀltnisse von nominalen und verbalen Wortarten zwischen AT und ZT
4.2.2 VerhÀltnis der SatzlÀngen zwischen AT und ZT
4.2.3 Einordnung der Ergebnisse zu Interference
5 Diskussion
6 Fazit und Ausblick
7 Literaturverzeichni
Retranslating 1984: the effects of linguistic and cultural changes on three Italian translations of the English literary classic
A piĂč di settantâanni dalla pubblicazione di 1984, il classico della letteratura inglese torna protagonista nelle librerie italiane con una serie di nuove traduzioni. Complice la scadenza nel 2021 dei termini di copyright, versioni aggiornate di quella che puĂČ definirsi come la fantasia distopica per eccellenza testimoniano come il romanzo orwelliano conservi unâattualitĂ tale da legittimare una sua rivisitazione in chiave moderna. Lâobiettivo di questo studio Ăš quello di indagare il fenomeno della ritraduzione, ovvero la realizzazione di una nuova traduzione di un testo nella stessa lingua di arrivo in cui era stato giĂ precedentemente tradotto, nel tentativo di stabilire in che misura interpretazioni successive a traduzioni esistenti riflettono i cambiamenti nel frattempo manifestatisi nel sistema linguistico e culturale di destinazione. A tal fine, Ăš compresa nello studio una comparazione di tre diverse traduzioni italiane del classico inglese, pubblicate rispettivamente nel 1950, 2000 e 2021. Lâanalisi Ăš condotta in relazione ad alcune delle piĂč influenti riflessioni teoriche riguardanti la pratica (ri)traduttiva, con uno sguardo alle strategie adottate per riprodurre le caratteristiche lessicali, sintattiche e stilistiche dellâoriginale in base alle sempre mutevoli norme del sistema letterario italiano. I risultati di tale analisi indicano che la ritraduzione letteraria Ăš da intendersi capace di favorire il processo di ri-standardizzazione in corso nella lingua di arrivo. Tuttavia, lâintroduzione di eventuali cambiamenti di tipo linguistico e culturale nellâitaliano delle traduzioni rappresenta una delle tante ragioni dietro alla proposta di traduzioni alternative del capolavoro di George Orwell, contribuendo cosĂŹ a contrastare la visione tuttora prevalente della ritraduzione come un fenomeno esclusivamente correttivo.More than seventy years after the publication of Nineteen Eighty-Four, a wave of new translations of the classic work of English literature captures the Italian publishing market. As a consequence of the copyright protection coming to an end in 2021, refreshed versions of what can be referred to as the ultimate dystopian fantasy testify to the fact that the contemporary relevance of the novel justifies making more up-to-date reinterpretations of Orwellâs book. The goal of the study is to investigate the phenomenon of retranslation, that is the act of translating a work that has previously been translated into the same language, in an attempt to establish the extent to which alternative renditions subsequent to extant translations reflect linguistic and cultural changes occurring in the receiving system. For this purpose, a comparison of three different Italian translations of the English classic, published in 1950, 2000 and 2021, is conducted. The analysis takes into account some of the most influential theoretical writings regarding the activity of retranslation, focusing on the different strategies implemented in order to adapt the lexical, syntactic and stylistic characteristics of the original to the ever-changing translation norms of the Italian literary system. According to the present study, literary retranslation is to be understood as capable of influencing the process of re-standardization taking place in the target language. Tellingly, innovations introduced in the Italian language of translations are one of the many reasons behind the production of alternative translations of George Orwellâs literary masterpiece, thus contributing to challenging the prevailing view of retranslation as an exclusively restorative phenomenon
Translationese indicators for human translation quality estimation (based on English-to-Russian translation of mass-media texts)
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Human translation quality estimation is a relatively new and challenging area of research,
because human translation quality is notoriously more subtle and subjective than machine
translation, which attracts much more attention and effort of the research community. At
the same time, human translation is routinely assessed by education and certification institutions,
as well as at translation competitions. Do the quality labels and scores generated
from real-life quality judgments align well with objective properties of translations? This
thesis puts this question to a test using machine learning methods.
Conceptually, this research is built around a hypothesis that linguistic properties characteristic
of translations, as a specific form of communication, can correlate with translation
quality. This assumption is often made in translation studies but has never been put to
a rigorous empirical test. Exploring translationese features in a quality estimation task
can help identify quality-related trends in translational behaviour and provide data-driven
insights into professionalism to improve training. Using translationese for quality estimation
fits well with the concept of quality in translation studies, because it is essentially a
document-level property. Linguistically-motivated translationese features are also more interpretable
than popular distributed representations and can explain linguistic differences
between quality categories in human translation.
We investigated (i) an extended set of Universal Dependencies-based morphosyntactic
features as well as two lexical feature sets capturing (ii) collocational properties of translations,
and (iii) ratios of vocabulary items in various frequency bands along with entropy
scores from n-gram models. To compare the performance of our feature sets in translationese
classifications and in quality estimation tasks against other representations, the
experiments were also run on tf-idf features, QuEst++ features and on contextualised
embeddings from a range of pre-trained language models, including the state-of-the-art
multilingual solution for machine translation quality estimation. Our major focus was on
document-level prediction, however, where the labels and features allowed, the experiments
were extended to the sentence level.
The corpus used in this research includes English-to-Russian parallel subcorpora of student
and professional translations of mass-media texts, and a register-comparable corpus of
non-translations in the target language. Quality labels for various subsets of student translations
come from a number of real-life settings: translation competitions, graded student
translations, error annotations and direct assessment. We overview approaches to benchmarking
quality in translation and provide a detailed description of our own annotation
experiments.
Of the three proposed translationese feature sets, morphosyntactic features, returned
the best results on all tasks. In many settings they were secondary only to contextualised
embeddings. At the same time, performance on various representations was contingent
on the type of quality captured by quality labels/scores. Using the outcomes of machine
learning experiments and feature analysis, we established that translationese properties of
translations were not equality reflected by various labels and scores. For example, professionalism
was much less related to translationese than expected. Labels from documentlevel
holistic assessment demonstrated maximum support for our hypothesis: lower-ranking
translations clearly exhibited more translationese. They bore more traces of mechanical
translational behaviours associated with following source language patterns whenever possible,
which led to the inflated frequencies of analytical passives, modal predicates, verbal
forms, especially copula verbs and verbs in the finite form. As expected, lower-ranking
translations were more repetitive and had longer, more complex sentences. Higher-ranking
translations were indicative of greater skill in recognising and counteracting translationese
tendencies. For document-level holistic labels as an approach to capture quality, translationese
indicators might provide a valuable contribution to an effective quality estimation
pipeline.
However, error-based scores, and especially scores from sentence-level direct assessment,
proved to be much less correlated by translationese and fluency issues, in general. This was
confirmed by relatively low regression results across all representations that had access only
to the target language side of the dataset, by feature analysis and by correlation between
error-based scores and scores from direct assessment