Search CORE

276 research outputs found

Is sentence compression an NLG task?

Author: Daelemans W.
Hendrickx I.
Krahmer E.J.
Marsi E.C.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Tilburg University Repository

Preferences versus adaption during referring expression generation

Author: Goudbeek M.B.
Krahmer E.J.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

Tilburg University Repository

On the limits of sentence compression by deletion

Author: Daelemans W.
Hendrickx I.H.E.
Krahmer E.J.
Marsi E.C.
Publication venue: SpringerLink
Publication date: 01/01/2010
Field of study

Tilburg University Repository

O Auto da Compadecida - A Dog's Will: an analysis of the translation of idioms in the English subtitle

Author: Lubianco de Sá Emilene
Publication venue: Florianópolis, SC
Publication date: 01/01/2014
Field of study

TCC (Graduação) - Universidade Federal de Santa Catarina. Centro de Comunicação e Expressão. Letras Inglês.The objective of this study is to analyse the English subtitles of the Brazilian film O Auto da Compadecida – A Dog’s Will (Arraes, 2000) – in order to verify which strategies were used in the translation of idioms and fixed expressions. The choices of translation strategies were described based on concepts discussed within Translation Studies (TS). The results reveal that despite the technical constraints of subtitling, especially with regard to time and space, the translator managed to use strategies, albeit varied, that reproduced the idioms in the subtitles, which shows an understanding of the important role of these elements in the mood of the film. / / / Este trabalho tem como objetivo analisar as legendas em língua inglesa do filme brasileiro O Auto da Compadecida (Arraes, 2000) a fim de verificar quais estratégias foram utilizadas na tradução de expressões idiomáticas. As escolhas de estratégias de tradução foram descritas com base em conceitos discutidos no campo dos Estudos da Tradução. Os resultados revelam que apesar das limitações técnicas da legendagem, especialmente com relação ao tempo e espaço, o tradutor conseguiu usar estratégias, embora variadas, que reproduziram as expressões idiomáticas nas legendas, o que mostra uma compreensão do importante papel destes elementos no espírito do filme

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Adapting End-to-End Speech Recognition for Readable Subtitles

Author: Liu Danni
Niehues Jan
Spanakis Gerasimos
Publication venue
Publication date: 01/01/2020
Field of study

Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.Comment: IWSLT 202

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

MuST-Cinema: a Speech-to-Subtitles corpus

Author: Karakanta Alina
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2020
Field of study

Growing needs in localising audiovisual content in multiple languages through subtitles call for the development of automatic solutions for human subtitling. Neural Machine Translation (NMT) can contribute to the automatisation of subtitling, facilitating the work of human subtitlers and reducing turn-around times and related costs. NMT requires high-quality, large, task-specific training data. The existing subtitling corpora, however, are missing both alignments to the source language audio and important information about subtitle breaks. This poses a significant limitation for developing efficient automatic approaches for subtitling, since the length and form of a subtitle directly depends on the duration of the utterance. In this work, we present MuST-Cinema, a multilingual speech translation corpus built from TED subtitles. The corpus is comprised of (audio, transcription, translation) triplets. Subtitle breaks are preserved by inserting special symbols. We show that the corpus can be used to build models that efficiently segment sentences into subtitles and propose a method for annotating existing subtitling corpora with subtitle breaks, conforming to the constraint of length.Comment: Accepted at LREC 202

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

Author: Karakanta Alina
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2020
Field of study

Subtitling is becoming increasingly important for disseminating information, given the enormous amounts of audiovisual content becoming available daily. Although Neural Machine Translation (NMT) can speed up the process of translating audiovisual content, large manual effort is still required for transcribing the source language, and for spotting and segmenting the text into proper subtitles. Creating proper subtitles in terms of timing and segmentation highly depends on information present in the audio (utterance duration, natural pauses). In this work, we explore two methods for applying Speech Translation (ST) to subtitling: a) a direct end-to-end and b) a classical cascade approach. We discuss the benefit of having access to the source language speech for improving the conformity of the generated subtitles to the spatial and temporal subtitling constraints and show that length is not the answer to everything in the case of subtitling-oriented ST.Comment: Accepted at IWSLT 202

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora

Author: Alina Karakanta
Marco Turchi
Matteo Negri
Sara Papi
Publication venue: place:Stroudsburg, PA
Publication date: 01/01/2022
Field of study

Speech translation for subtitling (SubST) is the task of automatically translating speech data into well-formed subtitles by inserting subtitle breaks compliant to specific displaying guidelines. Similar to speech translation (ST), model training requires parallel data comprising audio inputs paired with their textual translations. In SubST, however, the text has to be also annotated with subtitle breaks. So far, this requirement has represented a bottleneck for system development, as confirmed by the dearth of publicly available SubST corpora. To fill this gap, we propose a method to convert existing ST corpora into SubST resources without human intervention. We build a segmenter model that automatically segments texts into proper subtitles by exploiting audio and text in a multimodal fashion, achieving high segmentation quality in zero-shot conditions. Comparative experiments with SubST systems respectively trained on manual and automatic segmentations result in similar performance, showing the effectiveness of our approach

Archivio della ricerca - Fondazione Bruno Kessler