22 research outputs found
Embracing the threat: machine translation as a solution for subtitling
Recent decades have brought significant changes in the subtitling industry, both in terms of workflow and in the context of the market for audiovisual translation. Machine translation (MT), whilst in regular use in the traditional localisation industry, has not seen a significant uptake in the subtitling arena. The SUMAT project, an EU-funded project which ran from 2011 to 2014 had as its aim the building and evaluation of viable MT solutions for the subtitling industry in nine bidirectional language pairs. As part of the project, a year-long large-scale evaluation of the output of the resulting MT engines was carried out by trained subtitlers. This paper reports on the impetus behind the investigation of MT for subtitling, previous work in this field, and discusses some of the results of this evaluation, in particular an attempt to measure the extent of productivity gain or loss for subtitlers using machine translation as opposed to working in the traditional way. The paper examines opportunities and limitations of MT as a viable option for work of this nature and makes recommendations for the training of subtitle post-editors
Evaluating MT for massive open online courses: a multifaceted comparison between PBSMT and NMT systems
This article reports a multifaceted comparison between statistical
and neural machine translation (MT) systems that were developed for translation of data from Massive Open Online Courses (MOOCs). The study uses four
language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred
in side-by-side ranking, and is found to contain fewer overall errors. Results
are less clear-cut for some error categories, and for temporal and technical
post-editing effort. In addition, results are reported based on sentence length,
showing advantages and disadvantages depending on the particular language
pair and MT paradigm
Evaluating MT for massive open online courses
This article reports a multifaceted comparison between statistical and neural
machine translation (MT) systems that were developed for translation of data from
massive open online courses (MOOCs). The study uses four language pairs: English to
German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic
metrics and human evaluation, carried out by professional translators. Results
show that neuralMTis preferred in side-by-side ranking, and is found to contain fewer
overall errors. Results are less clear-cut for some error categories, and for temporal
and technical post-editing effort. In addition, results are reported based on sentence
length, showing advantages and disadvantages depending on the particular language
pair and MT paradigm
Translation crowdsourcing: creating a multilingual corpus of online educational content
The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course
material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing.
The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the
process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each
target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the
crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to
train, tune and test machine translation engines
Improving Machine Translation of Educational Content via Crowdsourcing
The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation
models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of
using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a
lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain
by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine
translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected
with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with
pre-existing in-domain corpora
Audio-description reloaded : an analysis of visual scenes in 2012 and Hero
This article explores whether the so-called new "cinema of attractions", with its supposed focus on visual effects to the detriment of storytelling, requires a specific approach to audio-description (AD). After some thoughts on film narrative in this type of cinema and the way in which it incorporates special effects, selected scenes with AD from two feature films, 2012 (directed by Emmerich) and Hero (directed by Zhang Yimou), are analysed. 2012 is a disaster movie aiming to thrill the audience with action. Hero is an equally visual movie but its imagery has an aesthetic purpose. The analysis investigates how space, time and action are treated in the films and the ADs, and how the information is presented in terms of focalization, timing and phrasing. The results suggest that effect-driven narratives require carefully timed and phrased ADs that devote much attention to the prosody of the AD script, its interaction with sounds and the use of metapho
Reduction levels in subtitling DVD subtitling : a compromise of trends
SIGLEAvailable from British Library Document Supply Centre- DSC:DXN065351 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Evaluating MT for massive open online courses: a multifaceted comparison between PBSMT and NMT systems
This article reports a multifaceted comparison between statistical
and neural machine translation (MT) systems that were developed for translation of data from Massive Open Online Courses (MOOCs). The study uses four
language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred
in side-by-side ranking, and is found to contain fewer overall errors. Results
are less clear-cut for some error categories, and for temporal and technical
post-editing effort. In addition, results are reported based on sentence length,
showing advantages and disadvantages depending on the particular language
pair and MT paradigm