7 research outputs found

    Impact of automatic segmentation on the quality, productivity and self-reported post-editing effort of intralingual subtitles

    Get PDF
    This paper describes the evaluation methodology followed to measure the impact of using a machine learning algorithm to automatically segment intralingual subtitles. The segmentation quality, productivity and self-reported post-editing effort achieved with such approach are shown to improve those obtained by the technique based in counting characters, mainly employed for automatic subtitle segmentation currently. The corpus used to train and test the proposed automated segmentation method is also described and shared with the community, in order to foster further research in this are

    Classification of Automatic Subtitling Tools. A Proposal

    Get PDF
    This article aims to provide a deeper understanding of the automatic subtitling tools that are currently being used and of their context of application. Based on the assumption that AVT, in general, and subtitling, in particular, are undergoing fundamental changes, it is our aim to analyse the range of tools that allow AVT translators to enhance their productivity and their efficiency. For this purpose, we have analysed 40 different automatic subtitling tools, currently available and accessible on Internet. Through this analysis, it has been possible to observe the main features of these tools and observe their functioning. Therefore, different criteria have been established in order to systemize this extensive inventory based on which 23 categories of software dedicated to automatic subtitling have been identified. These categories have been illustrated with examples. In this study, the aim is to provide a more accurate and more systematic understanding of automated subtitling programs. The paper is addressed to AVT professionals as well as to teachers and students having an interest in the state-of-the-art of automated subtitling

    Improving the automatic segmentation of subtitles through conditional random field

    Full text link
    [EN] Automatic segmentation of subtitles is a novel research field which has not been studied extensively to date. However, quality automatic subtitling is a real need for broadcasters which seek for automatic solutions given the demanding European audiovisual legislation. In this article, a method based on Conditional Random Field is presented to deal with the automatic subtitling segmentation. This is a continuation of a previous work in the field, which proposed a method based on Support Vector Machine classifier to generate possible candidates for breaks. For this study, two corpora in Basque and Spanish were used for experiments, and the performance of the current method was tested and compared with the previous solution and two rule-based systems through several evaluation metrics. Finally, an experiment with human evaluators was carried out with the aim of measuring the productivity gain in post-editing automatic subtitles generated with the new method presented.This work was partially supported by the project CoMUN-HaT - TIN2015-70924-C2-1-R (MINECO/FEDER).Alvarez, A.; Martínez-Hinarejos, C.; Arzelus, H.; Balenciaga, M.; Del Pozo, A. (2017). Improving the automatic segmentation of subtitles through conditional random field. Speech Communication. 88:83-95. https://doi.org/10.1016/j.specom.2017.01.010S83958

    SubItalian. Un'applicazione Web per l'apprendimento dell'italiano tramite sottotitolazione e annotazione di audiovisivi.

    Get PDF
    SubItalian è un'applicazione web nata con lo scopo di dimostrare come le tecnologie informatiche sviluppate negli ultimi anni possano facilitare e incentivare l'apprendimento di una lingua (in questo caso l'italiano) in modo del tutto alternativo e, con tutta probabilità, decisamente stimolante. L'applicazione prevede due tipologie di utenti: - learner: tutti coloro che vogliano apprendere l'italiano; - tutor: tutti coloro che siano interessati nell'affiancare gli utenti learner. Tutti gli utenti possono caricare in SubItalian del materiale audiovisivo che viene trascritto e annotato automaticamente. I tutor hanno a disposizione degli strumenti per: - revisionare ciò che il sistema ha generato automaticamente; - creare esercitazioni destinate ai learner. Inoltre, è disponibile un forum tramite cui utenti learner e utenti tutor possono comunicare. SubItalian is a web application created to demonstrate how informatics technologies developed in recent years can facilitate and encourage the learning of a foreign language (in this case Italian) in an alternative and, likely, very stimulating way. The application expects two types of users: - learner: all those who want to learn Italian; - tutor: all those who are interested in assisting the learners. Every user can upload to SubItalian audiovisual material which is transcribed and annotated automatically. Tutor users also have access to some useful tools allowing them to: - revision what the system has automatically generated; - create exercises designed for learner users. In addition, a forum through which learners and tutors can communicate is provided

    Automatic Speech Recognition (ASR) and NMT for Interlingual and Intralingual Communication: Speech to Text Technology for Live Subtitling and Accessibility.

    Get PDF
    Considered the increasing demand for institutional translation and the multilingualism of international organizations, the application of Artificial Intelligence (AI) technologies in multilingual communications and for the purposes of accessibility has become an important element in the production of translation and interpreting services (Zetzsche, 2019). In particular, the widespread use of Automatic Speech Recognition (ASR) and Neural Machine Translation (NMT) technology represents a recent development in the attempt of satisfying the increasing demand for interinstitutional, multilingual communications at inter-governmental level (Maslias, 2017). Recently, researchers have been calling for a universalistic view of media and conference accessibility (Greco, 2016). The application of ASR, combined with NMT, may allow for the breaking down of communication barriers at European institutional conferences where multilingualism represents a fundamental pillar (Jopek Bosiacka, 2013). In addition to representing a so-called disruptive technology (Accipio Consulting, 2006), ASR technology may facilitate the communication with non-hearing users (Lewis, 2015). Thanks to ASR, it is possible to guarantee content accessibility for non-hearing audience via subtitles at institutionally-held conferences or speeches. Hence the need for analysing and evaluating ASR output: a quantitative approach is adopted to try to make an evaluation of subtitles, with the objective of assessing its accuracy (Romero-Fresco, 2011). A database of F.A.O.’s and other international institutions’ English-language speeches and conferences on climate change is taken into consideration. The statistical approach is based on WER and NER models (Romero-Fresco, 2016) and on an adapted version. The ASR software solution implemented into the study will be VoxSigma by Vocapia Research and Google Speech Recognition engine. After having defined a taxonomic scheme, Native and Non-Native subtitles are compared to gold standard transcriptions. The intralingual and interlingual output generated by NMT is specifically analysed and evaluated via the NTR model to evaluate accuracy and accessibility
    corecore