Search CORE

1,784 research outputs found

Speaking while listening: Language processing in speech shadowing and translation

Author: Van Paridon J.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2021
Field of study

Contains fulltext : 233349.pdf (Publisher’s version ) (Open Access)Radboud University, 25 mei 2021Promotores : Meyer, A.S., Roelofs, A.P.A.199 p

Radboud Repository

MPG.PuRe

Crossroads between contrastive linguistics, translation studies and machine translation

Author
Publication venue
Publication date
Field of study

Contrastive Linguistics (CL), Translation Studies (TS) and Machine Translation (MT) have common grounds: They all work at the crossroad where two or more languages meet. Despite their inherent relatedness, methodological exchange between the three disciplines is rare. This special issue touches upon areas where the three fields converge. It results directly from a workshop at the 2011 German Association for Language Technology and Computational Linguistics (GSCL) conference in Hamburg where researchers from the three fields presented and discussed their interdisciplinary work. While the studies contained in this volume draw from a wide variety of objectives and methods, and various areas of overlaps between CL, TS and MT are addressed, the volume is by no means exhaustive with regard to this topic. Further cross-fertilisation is not only desirable, but almost mandatory in order to tackle future tasks and endeavours, and this volume is committed to bringing these three fields even closer together

OAPEN Library

Complexity in Translation. An English-Norwegian Study of Two Text Types

Author: Thunes Martha
Publication venue: The University of Bergen
Publication date: 01/01/2011
Field of study

The present study discusses two primary research questions. Firstly, we have tried to investigate to what extent it is possible to compute the actual translation relation found in a selection of English-Norwegian parallel texts. By this we understand the generation of translations with no human intervention, and we assume an approach to machine translation (MT) based on linguistic knowledge. In order to answer this question, a measurement of translational complexity is applied to the parallel texts. Secondly, we have tried to find out if there is a difference in the degree of translational complexity between the two text types, law and fiction, included in the empirical material. The study is a strictly product-oriented approach to complexity in translation: it disregards aspects related to translation methods, and to the cognitive processes behind translation. What we have analysed are intersubjectively available relations between source texts and existing translations. The degree of translational complexity in a given translation task is determined by the types and amounts of information needed to solve it, as well as by the accessibility of these information sources, and the effort required when they are processed. For the purpose of measuring the complexity of the relation between a source text unit and its target correspondent, we apply a set of four correspondence types, organised in a hierarchy reflecting divisions between different linguistic levels, along with a gradual increase in the degree of translational complexity. In type 1, the least complex type, the corresponding strings are pragmatically, semantically, and syntactically equivalent, down to the level of the sequence of word forms. In type 2, source and target string are pragmatically and semantically equivalent, and equivalent with respect to syntactic functions, but there is at least one mismatch in the sequence of constituents or in the use of grammatical form words. Within type 3, source and target string are pragmatically and semantically equivalent, but there is at least one structural difference violating syntactic functional equivalence between the strings. In type 4, there is at least one linguistically non-predictable, semantic discrepancy between source and target string. The correspondence type hierarchy, ranging from 1 to 4, is characterised by an increase with respect to linguistic divergence between source and target string, an increase in the need for information and in the amount of effort required to translate, and a decrease in the extent to which there exist implications between relations of source-target equivalence at different linguistic levels. We assume that there is a translational relation between the inventories of simple and complex linguistic signs in two languages which is predictable, and hence computable, from information about source and target language systems, and about how the systems correspond. Thus, computable translations are predictable from the linguistic information coded in the source text, together with given, general information about the two languages and their interrelations. Further, we regard non-computable translations to be correspondences where it is not possible to predict the target expression from the information encoded in the source expression, together with given, general information about SL and TL and their interrelations. Non-computable translations require access to additional information sources, such as various kinds of general or task-specific extra-linguistic information, or task-specific linguistic information from the context surrounding the source expression. In our approach, correspondences of types 1–3 constitute the domain of linguistically predictable, or computable, translations, whereas type 4 correspondences belong to the non-predictable, or non-computable, domain, where semantic equivalence is not fulfilled. The empirical method involves extracting translationally corresponding strings from parallel texts, and assigning one of the types defined by the correspondence hierarchy to each recorded string pair. The analysis is applied to running text, omitting no parts of it. Thus, the distribution of the four types of translational correspondence within a set of data provides a measurement of the degree of translational complexity in the parallel texts that the data are extracted from. The complexity measurements of this study are meant to show to what extent we assume that an ideal, rule-based MT system could simulate the given translations, and for this reason the finite clause is chosen as the primary unit of analysis. The work of extracting and classifying translational correspondences is done manually as it requires a bilingually competent human analyst. In the present study, the recorded data cover about 68 000 words. They are compiled from six different text pairs: two of them are law texts, and the remaining four are fiction texts. Comparable amounts of text are included for each text type, and both directions of translation are covered. Since the scope of the investigation is limited, we cannot, on the basis of our analysis, generalise about the degree of translational complexity in the chosen text types and in the language pair English-Norwegian. Calculated in terms of string lengths, the complexity measurement across the entire collection of data shows that as little as 44,8% of all recorded string pairs are classified as computable translational correspondences, i.e. as type 1, 2, or 3, and non-computable string pairs of type 4 constitute a majority (55,2%) of the compiled data. On average, the proportion of computable correspondences is 50,2% in the law data, and 39,6% in fiction. In relation to the question whether it would be fruitful to apply automatic translation to the selected texts, we have considered the workload potentially involved in correcting machine output, and in this respect the difference in restrictedness between the two text types is relevant. Within the non-computable correspondences, the frequency of cases exhibiting only one minimal semantic deviation between source and target string is considerably higher among the data extracted from the law texts than among those recorded from fiction. For this reason we tentatively regard the investigated pairs of law texts as representing a text type where tools for automatic translation may be helpful, if the effort required by post-editing is smaller than that of manual translation. This is possibly the case in one of the law text pairs, where 60,9% of the data involve computable translation tasks. In the other pair of law texts the corresponding figure is merely 38,8%, and the potential helpfulness of automatisation would be even more strongly determined by the edit cost. That text might be a task for computer-aided translation, rather than for MT. As regards the investigated fiction texts, it is our view that post-editing of automatically generated translations would be laborious and not cost effective, even in the case of one text pair showing a relatively low degree of translational complexity. Hence, we concur with the common view that the translation of fiction is not a task for MT

University of Bergen

NORA - Norwegian Open Research Archives

Syntactic difficulties in translation

Author: Vanroy Bram
Publication venue: Universiteit Gent. Faculteit Letteren en Wijsbegeerte
Publication date: 01/01/2021
Field of study

Even though machine translation (MT) systems such as Google Translate and DeepL have improved significantly over the last years, a continuous rise in globalisation and linguistic diversity requires increasing amounts of professional, error-free translation. One can imagine, for instance, that mistakes in medical leaflets can lead to disastrous consequences. Less catastrophic, but equally significant, is the lack of a consistent and creative style of MT systems in literary genres. In such cases, a human translation is preferred. Translating a text is a complex procedure that involves a variety of mental processes such as understanding the original message and its context, finding a fitting translation, and verifying that the translation is grammatical, contextually sound, and generally adequate and acceptable. From an educational perspective, it would be helpful if the translation difficulty of a given text can be predicted, for instance to ensure that texts of objectively appropriate difficulty levels are used in exams and assignments for translators. Also in the translation industry it may prove useful, for example to direct more difficult texts to more experienced translators. During this PhD project, my coauthors and I investigated which linguistic properties contribute to such difficulties. Specifically, we put our attention to syntactic differences between a source text and its translation, that is to say their (dis)similarities in terms of linguistic structure. To this end we developed new measures that can quantify such differences and made the implementation publicly available for other researchers to use. These metrics include word (group) movement (how does the order in the original text differ from that in a given translation), changes in the linguistic properties of words, and a comparison of the underlying abstract structure of a sentence and a translation. Translation difficulty cannot be directly measured but process information can help. Particularly, keystroke logging and eye-tracking data can be recorded during translation and used as a proxy for the required cognitive effort. An example: the longer a translator looks at a word, the more time and effort they likely need to process it. We investigated the effect that specific measures of syntactic similarity have on these behavioural processing features to get an indication of what their effect is on the translation difficulty. In short: how does the syntactic (dis)similarity between a source text and a possible translation impact the translation difficulty? In our experiments, we show that different syntactic properties indeed have an effect, and that differences in syntax between a source text and its translation affect the cognitive effort required to translate that text. These effects are not identical between syntactic properties, though, suggesting that individual syntactic properties affect the translation process in different ways and that not all syntactic dissimilarities contribute to translation difficulty equally.De kwaliteit van machinevertaalsystemen (MT) zoals Google Translate en DeepL is de afgelopen jaren sterk verbeterd. Door alsmaar meer globalisering en taalkundige diversiteit is er echter meer dan ooit nood aan professionele vertalingen waar geen fouten in staan. In zekere communicatievormen zouden vertaalfouten namelijk tot desastreuse gevolgen kunnen leiden, bijvoorbeeld in medische bijsluiters. Ook in minder levensbedreigende situaties verkiezen we nog steeds menselijke vertalingen, bijvoorbeeld daar waar een creatieve en consistente stijl noodzakelijk is, zoals in boeken en poëzie. Een tekst vertalen is een complex karwei waarin verschillende mentale processen een rol spelen. Zo moet bijvoorbeeld de brontekst gelezen en begrepen worden, moet er naar een vertaling gezocht worden, en daarbovenop moet tijdens het vertaalproces de vertaling continu gecontroleerd worden om te zorgen dat het ten eerste een juiste vertaling is en ten tweede dat de tekst ook grammaticaal correct is in de doeltaal. Vanuit een pedagogisch standpunt zou het nuttig zijn om de vertaalmoeilijkheid van een tekst te voorspellen. Zo wordt ervoor gezorgd dat de taken en examens van vertaalstudenten tot een objectief bepaald moeilijkheidsniveau behoren. Ook in de vertaalindustrie zou zo’n systeem van toepassing zijn; moeilijkere teksten kunnen aan de meest ervaren vertalers worden bezorgd. Samen met mijn medeauteurs heb ik tijdens dit doctoraatsproject onderzocht welke eigenschappen van een tekst bijdragen tot vertaalmoeilijkheden. We legden daarbij de nadruk op taalkundige, structurele verschillen tussen de brontekst en diens vertaling, en ontwikkelden verscheidene metrieken om dit soort syntactische verschillen te kunnen meten. Zo kan bijvoorbeeld een verschillende woord(groep)volgorde worden gekwantificeerd, kunnen verschillen in taalkundige labels worden geteld, en kunnen de abstracte, onderliggende structuren van een bronzin en een vertaling vergeleken worden. We maakten de implementatie van deze metrieken openbaar beschikbaar. De vertaalmoeilijkheid van een tekst kan niet zomaar gemeten worden, maar door naar gedragsdata van een vertaler te kijken, krijgen we wel een goed idee van de moeilijkheden waarmee ze geconfronteerd werden. De bewegingen en focuspunten van de ogen van de vertaler en hun toetsaanslagen kunnen worden geregistreerd en nadien gebruikt in een experimentele analyse. Ze geven ons nuttig informatie en kunnen zelfs dienen als een benadering van de nodige inspanning die geleverd moest worden tijdens het vertaalproces. Daarmee leidt het ons ook naar de elementen (woorden, woordgroepen) waar de vertaler moeilijkheden mee had. Als een vertaler lang naar een woord kijkt, dan kunnen we aannemen dat de verwerking ervan veel inspanning vergt. We kunnen deze gedragsdata dus gebruiken als een maat voor moeilijkheid. In ons onderzoek waren we voornamelijk benieuwd naar het effect van syntactische verschillen tussen een bronzin en een doelzin op dit soort gedragsdata. Onze resultaten tonen aan dat de voorgestelde metrieken inderdaad een effect hebben en dat taalkundige verschillen tussen een bron- en doeltekst leiden tot een hogere cognitieve belasting tijdens het vertalen van een tekst. Deze effecten verschillen per metriek, wat duidt op het belang van (onderzoek naar) individuele syntactische metrieken; niet elke metriek draagt even veel bij aan vertaalmoeilijkheden

Ghent University Academic Bibliography

Full Issue

Author
Publication venue: Clemson University Libraries
Publication date: 19/07/2021
Field of study

Clemson University: TigerPrints

Recommended from our members

Inductive Bias and Modular Design for Sample-Efficient Neural Language Learning

Author: Ponti Edoardo
Publication venue: University of Cambridge
Publication date: 30/07/2020
Field of study

Most of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions. Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models. Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations. The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation.ERC (Consolidator Grant 648909) Lexical Google Research Faculty Award 201

Apollo (Cambridge)

TC3-II

Author
Publication venue
Publication date: 01/01/2017
Field of study

Institutional Repository of the Freie Universität Berlin

Translatorische und außertranslatorische Automatizität: Eine integrative Darstellung

Author: Mikołaj Deckert
Publication venue: 'Faculty of Humanities and Social Sciences Osijek'
Publication date: 01/01/2019
Field of study

The paper examines empirically a subset of cognitive processes in trainee translators with the objective of gaining an insight into their decision-making. Specifically, we are interested in the nature and role of automated processing – above all, how pronounced it can be and how it influences the quality of decisions. The paper’s objective is then to come up with an integrative view of the relationship between translatorial automaticity and cognitive automaticity in general, viz. that not associated with translation. This could help us better capture some of the characteristics of translator behaviour and supplement our understanding of translation competence. Results from experiments with trainees reported in the paper show no correlation between the two dimensions of automated processing, and indicate that translatorial automaticity could be harder to override than its more general counterpart.Zweck dieser Abhandlung ist die empirische Untersuchung einer Teilmenge kognitiver Prozesse bei Übersetzern in der Ausbildung mit dem Ziel, einen Einblick in deren Entscheidungsfindung zu gewinnen. Dabei ist insbesondere die Art und Funktion der automatisierten Verarbeitung von Interesse – v. a. wie ausgeprägt diese sein kann und wie sie die Qualität von Entscheidungen beeinflusst. Ein weiteres Ziel der Abhandlung ist es, zu einer integrativen Sicht auf die Beziehung zwischen translatorischer Automatizität und kognitiver Automatizität allgemein, d. h. nicht im Zusammenhang mit Übersetzung, zu gelangen. Dies könnte zur besseren Erfassung bestimmter Merkmale des Übersetzerverhaltens beitragen und unser Verständnis der Übersetzungskompetenz verbessern. Die in der Abhandlung erläuterten Ergebnisse aus Experimenten mit Übersetzern in der Ausbildung weisen auf keinen Zusammenhang zwischen den beiden Dimensionen der automatisierten Verarbeitung hin und zeigen, dass sich die Ausschaltung der translatorischen Automatizität im Gegensatz zu der Ausschaltung ihres allgemeineren Gegenstücks als schwieriger erweisen könnte

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Empirical modelling of translation and interpreting

Author
Publication venue
Publication date
Field of study

"Empirical research is carried out in a cyclic way: approaching a research area bottom-up, data lead to interpretations and ideally to the abstraction of laws, on the basis of which a theory can be derived. Deductive research is based on a theory, on the basis of which hypotheses can be formulated and tested against the background of empirical data. Looking at the state-of-the-art in translation studies, either theories as well as models are designed or empirical data are collected and interpreted. However, the final step is still lacking: so far, empirical data has not lead to the formulation of theories or models, whereas existing theories and models have not yet been comprehensively tested with empirical methods. This publication addresses these issues from several perspectives: multi-method product- as well as process-based research may gain insights into translation as well as interpreting phenomena. These phenomena may include cognitive and organizational processes, procedures and strategies, competence and performance, translation properties and universals, etc. Empirical findings about the deeper structures of translation and interpreting will reduce the gap between translation and interpreting practice and model and theory building. Furthermore, the availability of more large-scale empirical testing triggers the development of models and theories concerning translation and interpreting phenomena and behavior based on quantifiable, replicable and transparent data.

OAPEN Library

Proceedings of the workshop on language technology for normalisation of less-resourced languages (SaLTMiL 8 - AfLaT 2012)

Author: De Pauw Guy
de Schryver Gilles-Maurice
Forcada Mike L
Sarasola Kepa
Tyers Francis M
Wagacha Peter W
Publication venue: European Language Resources Association
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography