4 research outputs found

    Automatic Construction of Discourse Corpora for Dialogue Translation

    Get PDF
    In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation

    A reception study of machine translated subtitles for MOOCs

    Get PDF
    As access has grown to online courses in the form of MOOCs (Massive Open Online Courses), the language barrier has become an important issue for users worldwide. Machine translation (MT) appears to offer an alternative or complementary solution to existing forms of MOOC translation. Very little attention has been paid, however, to the use and utility of MT for MOOC content. The main goal of this research is to test the impact machine-translated subtitles have on Chinese viewers’ reception of MOOC content. We are interested in whether there is any difference between viewers’ reception of raw machine-translated subtitles as opposed to fully post-edited machine-translated (PEMT) subtitles and human-translated (HT) subtitles. Based on an eye-tracking experiment conducted at two Chinese universities and survey methods, we show that participants who were offered full PEMT subtitles scored better overall on our reception metrics than those who were offered raw MT subtitles. HT subtitles, on the other hand, did not necessarily lead to better reception as expected; in contrast, the participants who were offered HT subtitles performed the worst in some of our reception metrics

    A reception study of machine translated subtitles for MOOCs

    Get PDF
    As MOOCs (Massive Open Online Courses) grow rapidly around the world, the language barrier is becoming a serious issue. Removing this obstacle by creating translated subtitles is an indispensable part of developing MOOCs and improving accessibility. Given the large quantity of MOOCs available worldwide and the considerable demand for them, machine translation (MT) appears to offer an alternative or complementary translation solution, thus providing the motivation for this research. The main goal of this research is to test the impact machine translated subtitles have on Chinese viewers’ reception of MOOC content. More specifically, the author is interested in whether there is any difference between viewers’ reception of raw machine translated subtitles as opposed to fully post-edited machine translated subtitles and human translated subtitles. Reception is operationalized by adapting Gambier's (2007) model, which divides ‘reception’ into ‘the three Rs’: (i) response, (ii) reaction and (iii) repercussion. Response refers to the initial physical response of a viewer to an audio-visual stimulus, in this case the subtitle and the rest of the image. Reaction involves the cognitive follow-on from initial response, and is linked to how much effort is involved in processing the subtitling stimulus and what is understood by the viewer. Repercussion refers to attitudinal and sociocultural dimensions of AVT consumption. The research contains a pilot study and a main experiment. Mixed methods of eye-tracking, questionnaires, translation quality assessment and frequency analysis were adopted. Over 60 native Chinese speakers were recruited as participants for this research. They were divided into three groups, those who read subtitles created by raw MT, post-edited MT (PE) and human translation (HT). Results show that most participants had a positive attitude towards the subtitles regardless of their type. Participants who were offered PE subtitles scored the best overall on the selected reception metrics. Participants who were offered HT subtitles performed the worst in some of the selected reception metrics
    corecore