239 research outputs found

    Detecting machine-translated subtitles in large parallel corpora

    Get PDF
    Parallel corpora extracted from online repositories of movie and TV subtitles are employed in a wide range of NLP applications, from language modelling to machine translation and dialogue systems. However, the subtitles uploaded in such repositories exhibit varying levels of quality. A particularly difficult problem stems from the fact that a substantial number of these subtitles are not written by human subtitlers but are simply generated through the use of online translation engines. This paper investigates whether these machine-generated subtitles can be detected automatically using a combination of linguistic and extra-linguistic features. We show that a feedforward neural network trained on a small dataset of subtitles can detect machine-generated subtitles with a F1-score of 0.64. Furthermore, applying this detection model on an unlabelled sample of subtitles allows us to provide a statistical estimate for the proportion of subtitles that are machine-translated (or are at least of very low quality) in the full corpus

    Audiovisual content analysis in the translation process

    Get PDF
    The article presents a comprehensive approach to the process of audiovisual translation that includes application of multimodal analysis of semiotic codes present in audiovisual productions. The article dwells on how the proposed approach can be applied to analyzing audiovisual productions for different types of audiovisual translation. Due to its multimodal nature, an audiovisual production is understood by the authors as an audiovisual text that combines image, sound and verbal means, that is, different modes conveying meaning. The means of conveying meaning in an audiovisual production include the visual non-verbal elements, visual verbal elements as well as audio non-verbal and verbal elements. The priority of these means of meaning transfer and their interaction in meaning generation differ significantly depending on the genre of audiovisual productions and the specifics of the process of its creation

    CYBERCIEGE VIDEOS: FROM ENGLISH TO SPANISH; VÍDEOS DE CYBERCIEGE: DEL INGLÉS AL ESPAÑOL

    Get PDF
    CyberCIEGE, a video game developed by the Naval Postgraduate School, supports cybersecurity awareness and education. A set of popular educational movies accompanies the game. The CyberCIEGE video collection was initially developed in English, thus limiting the diversity of its user population. In addition, the video format of the original movies (SWF) is being phased out due to security concerns associated with SWF video players. This capstone addresses the need for increased diversity of those familiar with cybersecurity basics by further introducing the Spanish-speaking community to 21st-century cybersecurity concepts. The CyberCIEGE movies were translated into Spanish to reach a broader audience while retaining the technical meaning of the concepts discussed. To contain costs, we demonstrate that it is possible to use open-source tools freely available for the Linux Operating System platform to produce reliable movies in both English and Spanish for web streaming. Recordings were created with Audacity and integrated as separate tracks to each corresponding movie via OpenShot. Afterward, the movies were exported to the MP4 file format for web streaming. In addition, the original English language movies were directly converted to MP4 file format via OpenShot. Detailed documentation ensures the repeatability of our processes. This work has increased the longevity of the CyberCIEGE video collection while expanding its viewership to a larger, more diverse audience.Chief Petty Officer, United States NavyApproved for public release. Distribution is unlimited

    Open subtitles 2018 : Statistical rescoring of sentence alignments in large, noisy parallel corpora

    Get PDF
    Peer reviewe

    Translation and Film: Slang, Dialects, Accents and Multiple Languages

    Get PDF

    AN ANALYSIS OF SUBTITLING STRATEGIES USED IN THE EXPRESSIVE UTTERANCES IN THE ‘THE FATHER’ MOVIE

    Get PDF
    This research discusses subtitling strategies for expressive utterances of a movie entitled ‘The Father’ from English into Indonesian. The objectives of this research are to identify types of expressive utterances and to investigate subtitle strategies used in translating the expressive utterances of the main character in ‘The Father’ movie. The researcher uses the qualitative method. The sources of the data are English and Indonesian original subtitles of ‘The Father’ movie. The data of this research ar e expressive utterances produced by the main character in ‘The Father’ movie. The results show that there are six types of expressive utterances in ‘The Father’ movie, including boasting (22), deploring (20), lamenting (15), thanking (6), apologizing (5), and forgiving (1). Besides that, there are eight subtitling strategies are used, including transfer (40), paraphrase (25), condensation (4), expansion (3), deletion (2), imitation (2), decimation (1), and transcription (1). Based on the results of the analysis, the most dominant type of expressive utterances used in the movie is Boasting (32%) and the subtitling strategy frequently used is Transfer (51%)

    Automatic Construction of Discourse Corpora for Dialogue Translation

    Get PDF
    In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation

    Why do people subtitle movies? A survey research of the subtitler motivations and practices

    Get PDF
    In this paper we investigate the reasons why enthusiasts dedicate time and effort to create subtitles for third-party videos shared on-line. Based on results obtained from a survey research with a community of Brazilian subtitlers, we highlight basic features of these enthusiasts as well as their motivations and main objectives. Our observations suggest that this is a volunteering and collaborative activity after all.CNPq (#312148/2014-3); FAPES (#67927378/2015

    MT for Subtitling : Investigating professional translators’ user experience and feedback

    Get PDF
    This paper presents a study of machine translation and post-editing in the field of audiovisual translation. We analyse user experience data collected from post-editing tasks completed by twelve translators in four language pairs. We also present feedback provided by the translators in semi-structured interviews. The results of the user experience survey and thematic analysis of interviews shows that the translators’ impression of post-editing subtitles was on average neutral to somewhat negative, with segmentation and timing of subtitles identified as a key factor. Finally, we discuss the implications of the issues arising from the user experience survey and interviews for the future development of automatic subtitle translation
    • 

    corecore