12 research outputs found
Predicting Post-Editor Profiles from the Translation Process
The purpose of the current investigation is to predict post-editor profiles based on user be-
haviour and demographics using machine learning techniques to gain a better understanding of
post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database
from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main
research goals: We create n-gram models based on user activity and part-of-speech sequences
to automatically cluster post-editors, and we use discriminative classifier models to character-
ize post-editors based on a diverse range of translation process features. The classification and
clustering of participants resulting from our study suggest this type of exploration could be
used as a tool to develop new translation tool features or customization possibilities
Recognition of Translator Expertise Using Sequences of Fixations and Keystrokes
Professional human translation is necessary to meet high quality standards in industry and governmental agencies. Translators engage in multiple activities during their task, and there is a need to model their behavior, with the objective to understand and optimize the translation process. In recent years, user interfaces enabled us to record user events such as eye-movements or keystrokes. Although there have been insightful descriptive analysis of the translation process, there are multiple advantages in enabling quantitative inference. We present methods to classify sequences of fixations and keystrokes into activities and model translation sessions with the objective to recognize translator expertise. We show significant error reductions in the task of recognizing certified translators and their years of experience, and analyze the characterizing patterns
Final Report on User Interface Studies, Cognitive and User Modelling
D1.3 marks the final CASMACAT report on user interface studies, cognitive and user modelling covering
the completion of tasks T1.5 (Cognitive Modelling) and T1.6 (User Modelling) as part of Work
Package 1.
Within tasks T1.1 to T1.4, a series of experiments have established a solid understanding of human
behaviour in computer-aided translation, focusing on the use of visualization options, different translation
modalities, individual differences in translation production, translator types and translation/postediting
styles. Additionally, the bulk of this experimental data has been released as a publicly available
database under a creative common license and further details on this can be found in D1.4. In parallel
to these more holistic studies, a second set of experiments aimed to examine some of these factors in a
constrained laboratory setting. These focused on the underlying psycholinguistic processing and cognitive
modelling of translators’ activity to capture reading difficulty, verification and perplexity during
translation and post-editing.
This deliverable combines these earlier empirical findings with experiments conducted in Year 3
of the project and grounds translation within a broader theoretical framework associated with human
sentence processing and communication. As well as broadening our general understanding of bilingual
cognitive processing, there were two major objectives behind the experimental investigations in Year 3.
The first was to evaluate the utility of providing translators with Source-Target word alignment information
through spatially-direct visual cues. The second was to determine what, if any, differences arise
from expertise by comparing the results between a group of bilinguals and a group of professionally
trained translators on the same translation-related tasks
読み方の定量的分析に基づく個人およびテキストの特徴認識
学位の種別:課程博士University of Tokyo(東京大学
Problem solving activities in post-editing and translation from scratch: A multi-method study
Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing.
Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity.
This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task).
Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data
Problem solving activities in post-editing and translation from scratch: A multi-method study
Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing.
Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity.
This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task).
Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data
Problem solving activities in post-editing and translation from scratch: A multi-method study
Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing.
Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity.
This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task).
Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data
Problem solving activities in post-editing and translation from scratch: A multi-method study
Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing.
Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity.
This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task).
Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data
Problem solving activities in post-editing and translation from scratch: A multi-method study
Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing.
Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity.
This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task).
Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data
Problem solving activities in post-editing and translation from scratch: A multi-method study
Companies and organisations are increasingly using machine translation to improve efficiency and cost-effectiveness, and then edit the machine translated output to create a fluent text that adheres to given text conventions. This procedure is known as post-editing.
Translation and post-editing can often be categorised as problem-solving activities. When the translation of a source text unit is not immediately obvious to the translator, or in other words, if there is a hurdle between the source item and the target item, the translation process can be considered problematic. Conversely, if there is no hurdle between the source and target texts, the translation process can be considered a task-solving activity and not a problem-solving activity.
This study investigates whether machine translated output influences problem-solving effort in internet research, syntax, and other problem indicators and whether the effort can be linked to expertise. A total of 24 translators (twelve professionals and twelve semi-professionals) produced translations from scratch from English into German, and (monolingually) post-edited machine translation output for this study. The study is part of the CRITT TPR-DB database. The translation and (monolingual) post-editing sessions were recorded with an eye-tracker and a keylogging program. The participants were all given the same six texts (two texts per task).
Different approaches were used to identify problematic translation units. First, internet research behaviour was considered as research is a distinct indicator of problematic translation units. Then, the focus was placed on syntactical structures in the MT output that do not adhere to the rules of the target language, as I assumed that they would cause problems in the (monolingual) post-editing tasks that would not occur in the translation from scratch task. Finally, problem indicators were identified via different parameters like Munit, which indicates how often the participants created and modified one translation unit, or the inefficiency (InEff) value of translation units, i.e. the number of produced and deleted tokens divided by the final length of the translation. Finally, the study highlights how these parameters can be used to identify problems in the translation process data using mere keylogging data