Search CORE

152 research outputs found

What’s in a domain?:Towards fine-grained adaptation for machine translation

Author: van der Wees M.E.
Publication venue
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Domain adaptation for statistical machine translation of corporate and user-generated content

Author: Banerjee Pratyush
Publication venue: Dublin City University. School of Computing
Publication date: 01/03/2013
Field of study

The growing popularity of Statistical Machine Translation (SMT) techniques in recent years has led to the development of multiple domain-specic resources and adaptation scenarios. In this thesis we address two important and industrially relevant adaptation scenarios, each suited to different kinds of content. Initially focussing on professionally edited `enterprise-quality' corporate content, we address a specic scenario of data translation from a mixture of different domains where, for each of them domain-specific data is available. We utilise an automatic classifier to combine multiple domain-specific models and empirically show that such a configuration results in better translation quality compared to both traditional and state-of-the-art techniques for handling mixed domain translation. In the second phase of our research we shift our focus to the translation of possibly `noisy' user-generated content in web-forums created around products and services of a multinational company. Using professionally edited translation memory (TM) data for training, we use different normalisation and data selection techniques to adapt SMT models to noisy forum content. In this scenario, we also study the effect of mixture adaptation using a combination of in-domain and out-of-domain data at different component levels of an SMT system. Finally we focus on the task of optimal supplementary training data selection from out-of-domain corpora using a novel incremental model merging mechanism to adapt TM-based models to improve forum-content translation quality

Irish Universities

DCU Online Research Access Service

Eyetracking and Applied Linguistics

Author
Publication venue
Publication date
Field of study

Eyetracking has become a powerful tool in scientific research and has finally found its way into disciplines such as applied linguistics and translation studies, paving the way for new insights and challenges in these fields. The aim of the first International Conference on Eyetracking and Applied Linguistics (ICEAL) was to bring together researchers who use eyetracking to empirically answer their research questions. It was intended to bridge the gaps between applied linguistics, translation studies, cognitive science and computational linguistics on the one hand and to further encourage innovative research methodologies and data triangulation on the other hand. These challenges are also addressed in this proceedings volume: While the studies described in the volume deal with a wide range of topics, they all agree on eyetracking as an appropriate methodology in empirical research

OAPEN Library

Multi-modal post-editing of machine translation

Author: Herbig Nico
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

As MT quality continues to improve, more and more translators switch from traditional translation from scratch to PE of MT output, which has been shown to save time and reduce errors. Instead of mainly generating text, translators are now asked to correct errors within otherwise helpful translation proposals, where repetitive MT errors make the process tiresome, while hard-to-spot errors make PE a cognitively demanding activity. Our contribution is three-fold: first, we explore whether interaction modalities other than mouse and keyboard could well support PE by creating and testing the MMPE translation environment. MMPE allows translators to cross out or hand-write text, drag and drop words for reordering, use spoken commands or hand gestures to manipulate text, or to combine any of these input modalities. Second, our interviews revealed that translators see value in automatically receiving additional translation support when a high CL is detected during PE. We therefore developed a sensor framework using a wide range of physiological and behavioral data to estimate perceived CL and tested it in three studies, showing that multi-modal, eye, heart, and skin measures can be used to make translation environments cognition-aware. Third, we present two multi-encoder Transformer architectures for APE and discuss how these can adapt MT output to a domain and thereby avoid correcting repetitive MT errors.Angesichts der stetig steigenden Qualität maschineller Übersetzungssysteme (MÜ) post-editieren (PE) immer mehr Übersetzer die MÜ-Ausgabe, was im Vergleich zur herkömmlichen Übersetzung Zeit spart und Fehler reduziert. Anstatt primär Text zu generieren, müssen Übersetzer nun Fehler in ansonsten hilfreichen Übersetzungsvorschlägen korrigieren. Dennoch bleibt die Arbeit durch wiederkehrende MÜ-Fehler mühsam und schwer zu erkennende Fehler fordern die Übersetzer kognitiv. Wir tragen auf drei Ebenen zur Verbesserung des PE bei: Erstens untersuchen wir, ob andere Interaktionsmodalitäten als Maus und Tastatur das PE unterstützen können, indem wir die Übersetzungsumgebung MMPE entwickeln und testen. MMPE ermöglicht es, Text handschriftlich, per Sprache oder über Handgesten zu verändern, Wörter per Drag & Drop neu anzuordnen oder all diese Eingabemodalitäten zu kombinieren. Zweitens stellen wir ein Sensor-Framework vor, das eine Vielzahl physiologischer und verhaltensbezogener Messwerte verwendet, um die kognitive Last (KL) abzuschätzen. In drei Studien konnten wir zeigen, dass multimodale Messung von Augen-, Herz- und Hautmerkmalen verwendet werden kann, um Übersetzungsumgebungen an die KL der Übersetzer anzupassen. Drittens stellen wir zwei Multi-Encoder-Transformer-Architekturen für das automatische Post-Editieren (APE) vor und erörtern, wie diese die MÜ-Ausgabe an eine Domäne anpassen und dadurch die Korrektur von sich wiederholenden MÜ-Fehlern vermeiden können.Deutsche Forschungsgemeinschaft (DFG), Projekt MMP

Universaar

Acronym

Eyetracking and Applied Linguistics

Author
Publication venue
Publication date: 01/01/2016
Field of study

Institutional Repository of the Freie Universität Berlin

Proceedings of the 3rd Swiss conference on barrier-free communication (BfC 2020)

Author: Carrer Luisa
Jekat Susanne Johanna
Lintner Alexa
Puhl Steffen
Publication venue: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
Publication date: 01/01/2021
Field of study

ZHAW digitalcollection

Technical vs Ideological Manipulation of MENA Political Narratives via Subtitling

Author: HAANI BELHAJ
Publication venue: 'Swansea University'
Publication date: 01/01/2022
Field of study

MENA political conflicts have inculcated controversial narratives, giving rise to deep-seated political tensions and combat, locally and globally. Political media can accentuate or contest such narratives and, sometimes, even create new ones. Narratives dwell in their source text until they are relocated to the target text through the translation process, in which they can often be subject to multi-level manipulation in proportion to the ideological constraints of translators and their institutions. Subtitling, in particular, also has its own technical constraints that can require textual manipulation. This variation of constraints motivated the study to investigate whether manipulation is technically necessitated or ideologically driven. The ultimate purpose is to raise awareness of the commonly unrecognised role of ideology in manipulating the subtitling of political narratives under the pretext of technicality. Focusing on the Arabic–English subtitling of MENA political narratives produced by Monitor Mideast, Palestinian Media Watch, and Middle East Media Research Institute, the investigation starts with the first phase, where a micro-analysis drawing on Gottlieb’s (1992) subtitling strategies differentiates between the subtitlers’ technical and ideological choices. The second phase of the investigation comprises of a macro-analysis (comprehensive framework) drawing on Baker’s (2006a) narrative account, which interprets the subtitlers’ ideological choices for the text in association with broader patterns of manipulation in the paratext and context. The study discussed concrete examples where ideology—rather than a technicality—manifested in textual choices. Coherently woven, furthermore, the narrative distortion shown was not only limited to the text but also included the paratext and context. Besides paratextual verbal manipulation (e.g., using different titles), there were also higher-level patterns of non-verbal manipulation that included reconfiguring the original narrative features. These multi-level manipulation patterns have ultimately led to the source text narratives being reframed in the target text

Cronfa at Swansea University

Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications

Author: Vajjala Balakrishna Sowmya
Publication venue: Universität Tübingen
Publication date: 01/01/2015
Field of study

Reading plays an important role in the process of learning and knowledge acquisition for both children and adults. However, not all texts are accessible to every prospective reader. Reading difficulties can arise when there is a mismatch between a reader’s language proficiency and the linguistic complexity of the text they read. In such cases, simplifying the text in its linguistic form while retaining all the content could aid reader comprehension. In this thesis, we study text complexity and simplification from a computational linguistic perspective. We propose a new approach to automatically predict the text complexity using a wide range of word level and syntactic features of the text. We show that this approach results in accurate, generalizable models of text readability that work across multiple corpora, genres and reading scales. Moving from documents to sentences, We show that our text complexity features also accurately distinguish different versions of the same sentence in terms of the degree of simplification performed. This is useful in evaluating the quality of simplification performed by a human expert or a machine-generated output and for choosing targets to simplify in a difficult text. We also experimentally show the effect of text complexity on readers’ performance outcomes and cognitive processing through an eye-tracking experiment. Turning from analyzing text complexity and identifying sentential simplifications to generating simplified text, one can view automatic text simplification as a process of translation from English to simple English. In this thesis, we propose a statistical machine translation based approach for text simplification, exploring the role of focused training data and language models in the process. Exploring the linguistic complexity analysis further, we show that our text complexity features can be useful in assessing the language proficiency of English learners. Finally, we analyze German school textbooks in terms of their linguistic complexity, across various grade levels, school types and among different publishers by applying a pre-existing set of text complexity features developed for German

Publikationsserver der Universität Tübingen

Reflexión crítica en los estudios de traducción basados en corpus

Author: Autores varios
Publication venue: 'Universitat Jaume I'
Publication date: 01/01/2021
Field of study

Repositori Institucional de la Universitat Jaume I