Search CORE

239 research outputs found

Detecting machine-translated subtitles in large parallel corpora

Author: Doğruöz A. Seza
Lison Pierre
Publication venue
Publication date: 01/01/2018
Field of study

Parallel corpora extracted from online repositories of movie and TV subtitles are employed in a wide range of NLP applications, from language modelling to machine translation and dialogue systems. However, the subtitles uploaded in such repositories exhibit varying levels of quality. A particularly difficult problem stems from the fact that a substantial number of these subtitles are not written by human subtitlers but are simply generated through the use of online translation engines. This paper investigates whether these machine-generated subtitles can be detected automatically using a combination of linguistic and extra-linguistic features. We show that a feedforward neural network trained on a small dataset of subtitles can detect machine-generated subtitles with a F1-score of 0.64. Furthermore, applying this detection model on an unlabelled sample of subtitles allows us to provide a statistical estimate for the proportion of subtitles that are machine-translated (or are at least of very low quality) in the full corpus

Ghent University Academic Bibliography

Audiovisual content analysis in the translation process

Author: Aleksandrova Elena V.
Artamonova Maria V.
Pulekha Irina R.
Rubtsova Svetlana Y.
Trofimova Nella A.
Tulina Ekaterina V.
Publication venue: Universidad de Granada
Publication date: 02/05/2023
Field of study

The article presents a comprehensive approach to the process of audiovisual translation that includes application of multimodal analysis of semiotic codes present in audiovisual productions. The article dwells on how the proposed approach can be applied to analyzing audiovisual productions for different types of audiovisual translation. Due to its multimodal nature, an audiovisual production is understood by the authors as an audiovisual text that combines image, sound and verbal means, that is, different modes conveying meaning. The means of conveying meaning in an audiovisual production include the visual non-verbal elements, visual verbal elements as well as audio non-verbal and verbal elements. The priority of these means of meaning transfer and their interaction in meaning generation differ significantly depending on the genre of audiovisual productions and the specifics of the process of its creation

Repositorio Institucional Universidad de Granada

CYBERCIEGE VIDEOS: FROM ENGLISH TO SPANISH; VÍDEOS DE CYBERCIEGE: DEL INGLÉS AL ESPAÑOL

Author: Rosario Hector M., Jr.
Publication venue: Monterey, CA; Naval Postgraduate School
Publication date: 01/06/2021
Field of study

CyberCIEGE, a video game developed by the Naval Postgraduate School, supports cybersecurity awareness and education. A set of popular educational movies accompanies the game. The CyberCIEGE video collection was initially developed in English, thus limiting the diversity of its user population. In addition, the video format of the original movies (SWF) is being phased out due to security concerns associated with SWF video players. This capstone addresses the need for increased diversity of those familiar with cybersecurity basics by further introducing the Spanish-speaking community to 21st-century cybersecurity concepts. The CyberCIEGE movies were translated into Spanish to reach a broader audience while retaining the technical meaning of the concepts discussed. To contain costs, we demonstrate that it is possible to use open-source tools freely available for the Linux Operating System platform to produce reliable movies in both English and Spanish for web streaming. Recordings were created with Audacity and integrated as separate tracks to each corresponding movie via OpenShot. Afterward, the movies were exported to the MP4 file format for web streaming. In addition, the original English language movies were directly converted to MP4 file format via OpenShot. Detailed documentation ensures the repeatability of our processes. This work has increased the longevity of the CyberCIEGE video collection while expanding its viewership to a larger, more diverse audience.Chief Petty Officer, United States NavyApproved for public release. Distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

Open subtitles 2018 : Statistical rescoring of sentence alignments in large, noisy parallel corpora

Author: Kouylekov M.
Lison P.
Tiedemann J.
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Translation and Film: Slang, Dialects, Accents and Multiple Languages

Author: Rittmayer Allison M.
Publication venue: Bucknell Digital Commons
Publication date: 25/04/2014
Field of study

Bucknell University

AN ANALYSIS OF SUBTITLING STRATEGIES USED IN THE EXPRESSIVE UTTERANCES IN THE ‘THE FATHER’ MOVIE

Author: Nurlaila Nurlaila
Pricilia Marcella
Publication venue: Program Studi Magister Teknik Mesin Universitas Gunadarma
Publication date: 17/06/2023
Field of study

This research discusses subtitling strategies for expressive utterances of a movie entitled ‘The Father’ from English into Indonesian. The objectives of this research are to identify types of expressive utterances and to investigate subtitle strategies used in translating the expressive utterances of the main character in ‘The Father’ movie. The researcher uses the qualitative method. The sources of the data are English and Indonesian original subtitles of ‘The Father’ movie. The data of this research ar e expressive utterances produced by the main character in ‘The Father’ movie. The results show that there are six types of expressive utterances in ‘The Father’ movie, including boasting (22), deploring (20), lamenting (15), thanking (6), apologizing (5), and forgiving (1). Besides that, there are eight subtitling strategies are used, including transfer (40), paraphrase (25), condensation (4), expansion (3), deletion (2), imitation (2), decimation (1), and transcription (1). Based on the results of the analysis, the most dominant type of expressive utterances used in the movie is Boasting (32%) and the subtitling strategy frequently used is Transfer (51%)

Gunadarma University: Ejournal UG

Automatic Construction of Discourse Corpora for Dialogue Translation

Author: Liu Qun
Tu Zhaopeng
Wang Longyue
Way Andy
Zhang Xiaojun
Publication venue: 'Museum National d''Histoire Naturelle, Paris, France'
Publication date: 13/05/2016
Field of study

In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation

arXiv.org e-Print Archive

Stirling Online Research Repository (RIOXX)

Irish Universities

DCU Online Research Access Service

Stirling Online Research Repository

Why do people subtitle movies? A survey research of the subtitler motivations and practices

Author: Brito Jessica Oliveira
Guimarães Rodrigo Laiola
Santos Celso A. A.
Publication venue: 'Sociedade Brasileira de Computacao - SB'
Publication date: 01/10/2017
Field of study

In this paper we investigate the reasons why enthusiasts dedicate time and effort to create subtitles for third-party videos shared on-line. Based on results obtained from a survey research with a community of Brazilian subtitlers, we highlight basic features of these enthusiasts as well as their motivations and main objectives. Our observations suggest that this is a volunteering and collaborative activity after all.CNPq (#312148/2014-3); FAPES (#67927378/2015

Repositório Institucional da Universidade Federal do Espirito Santo

MT for Subtitling : Investigating professional translators’ user experience and feedback

Author: Koponen Maarit
Sulubacak Umut
Tiedemann Jörg
Vitikainen Kaisa
Publication venue: AMTA
Publication date: 01/11/2020
Field of study

This paper presents a study of machine translation and post-editing in the field of audiovisual translation. We analyse user experience data collected from post-editing tasks completed by twelve translators in four language pairs. We also present feedback provided by the translators in semi-structured interviews. The results of the user experience survey and thematic analysis of interviews shows that the translators’ impression of post-editing subtitles was on average neutral to somewhat negative, with segmentation and timing of subtitles identified as a key factor. Finally, we discuss the implications of the issues arising from the user experience survey and interviews for the future development of automatic subtitle translation

Helsingin yliopiston digitaalinen arkisto