276 research outputs found
O Auto da Compadecida - A Dog's Will: an analysis of the translation of idioms in the English subtitle
TCC (Graduação) - Universidade Federal de Santa Catarina. Centro de Comunicação e Expressão. Letras Inglês.The objective of this study is to analyse the English subtitles of the Brazilian film O Auto da Compadecida – A Dog’s Will (Arraes, 2000) – in order to verify which strategies were used in the translation of idioms and fixed expressions. The choices of translation strategies were described based on concepts discussed within Translation Studies (TS). The results reveal that despite the technical constraints of subtitling, especially with regard to time and space, the translator managed to use strategies, albeit varied, that reproduced the idioms in the subtitles, which shows an understanding of the important role of these elements in the mood of the film. / / / Este trabalho tem como objetivo analisar as legendas em língua inglesa do filme brasileiro O Auto da Compadecida (Arraes, 2000) a fim de verificar quais estratégias foram utilizadas na tradução de expressões idiomáticas. As escolhas de estratégias de tradução foram descritas com base em conceitos discutidos no campo dos Estudos da Tradução. Os resultados revelam que apesar das limitações técnicas da legendagem, especialmente com relação ao tempo e espaço, o tradutor conseguiu usar estratégias, embora variadas, que reproduziram as expressões idiomáticas nas legendas, o que mostra uma compreensão do importante papel destes elementos no espírito do filme
Adapting End-to-End Speech Recognition for Readable Subtitles
Automatic speech recognition (ASR) systems are primarily evaluated on
transcription accuracy. However, in some use cases such as subtitling, verbatim
transcription would reduce output readability given limited screen size and
reading time. Therefore, this work focuses on ASR with output compression, a
task challenging for supervised approaches due to the scarcity of training
data. We first investigate a cascaded system, where an unsupervised compression
model is used to post-edit the transcribed speech. We then compare several
methods of end-to-end speech recognition under output length constraints. The
experiments show that with limited data far less than needed for training a
model from scratch, we can adapt a Transformer-based ASR model to incorporate
both transcription and compression capabilities. Furthermore, the best
performance in terms of WER and ROUGE scores is achieved by explicitly modeling
the length constraints within the end-to-end ASR system.Comment: IWSLT 202
MuST-Cinema: a Speech-to-Subtitles corpus
Growing needs in localising audiovisual content in multiple languages through
subtitles call for the development of automatic solutions for human subtitling.
Neural Machine Translation (NMT) can contribute to the automatisation of
subtitling, facilitating the work of human subtitlers and reducing turn-around
times and related costs. NMT requires high-quality, large, task-specific
training data. The existing subtitling corpora, however, are missing both
alignments to the source language audio and important information about
subtitle breaks. This poses a significant limitation for developing efficient
automatic approaches for subtitling, since the length and form of a subtitle
directly depends on the duration of the utterance. In this work, we present
MuST-Cinema, a multilingual speech translation corpus built from TED subtitles.
The corpus is comprised of (audio, transcription, translation) triplets.
Subtitle breaks are preserved by inserting special symbols. We show that the
corpus can be used to build models that efficiently segment sentences into
subtitles and propose a method for annotating existing subtitling corpora with
subtitle breaks, conforming to the constraint of length.Comment: Accepted at LREC 202
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?
Subtitling is becoming increasingly important for disseminating information,
given the enormous amounts of audiovisual content becoming available daily.
Although Neural Machine Translation (NMT) can speed up the process of
translating audiovisual content, large manual effort is still required for
transcribing the source language, and for spotting and segmenting the text into
proper subtitles. Creating proper subtitles in terms of timing and segmentation
highly depends on information present in the audio (utterance duration, natural
pauses). In this work, we explore two methods for applying Speech Translation
(ST) to subtitling: a) a direct end-to-end and b) a classical cascade approach.
We discuss the benefit of having access to the source language speech for
improving the conformity of the generated subtitles to the spatial and temporal
subtitling constraints and show that length is not the answer to everything in
the case of subtitling-oriented ST.Comment: Accepted at IWSLT 202
Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora
Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines. Similar to speech translation (ST), model training requires parallel data comprising audio inputs paired with their textual translations. In SubST, however, the text has to be also annotated with subtitle breaks. So far, this requirement has represented a bottleneck for system development, as confirmed by the dearth of publicly available SubST corpora. To fill this gap, we propose a method to convert existing ST corpora into SubST resources without human intervention. We build a segmenter model that automatically segments texts into proper subtitles by exploiting audio and text in a multimodal fashion, achieving high segmentation quality in zero-shot conditions. Comparative experiments with SubST systems respectively trained on manual and automatic segmentations result in similar performance, showing the effectiveness of our approach
- …