6,722 research outputs found

    Development of a speech recognition system for Spanish broadcast news

    Get PDF
    This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported

    SCOLA: What it is and How to Get it

    Get PDF

    An Illustrated Methodology for Evaluating ASR Systems

    Get PDF
    Proceeding of: 9th International Workshop on Adaptive Multimedia Retrieval (AMR 2011) Took place 2011, July, 18-19, in Barcelona, Spain. The event Web site is http://stel.ub.edu/amr2011/Automatic speech recognition technology can be integrated in an information retrieval process to allow searching on multimedia contents. But, in order to assure an adequate retrieval performance is necessary to state the quality of the recognition phase, especially in speaker-independent and domainindependent environments. This paper introduces a methodology to accomplish the evaluation of different speech recognition systems in several scenarios considering also the creation of new corpora of different types (broadcast news, interviews, etc.), especially in other languages apart from English that are not widely addressed in speech community.This work has been partially supported by the Spanish Center for Industry Technological Development (CDTI, Ministry of Industry, Tourism and Trade), through the BUSCAMEDIA Project (CEN-20091026). And also by MA2VICMR: Improving the access, analysis and visibility of the multilingual and multimedia information in web for the Region of Madrid (S2009/TIC-1542).Publicad

    Investigating cross-language speech retrieval for a spontaneous conversational speech collection

    Get PDF
    Cross-language retrieval of spontaneous speech combines the challenges of working with noisy automated transcription and language translation. The CLEF 2005 Cross-Language Speech Retrieval (CL-SR) task provides a standard test collection to investigate these challenges. We show that we can improve retrieval performance: by careful selection of the term weighting scheme; by decomposing automated transcripts into phonetic substrings to help ameliorate transcription errors; and by combining automatic transcriptions with manually-assigned metadata. We further show that topic translation with online machine translation resources yields effective CL-SR

    Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media

    Get PDF
    The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset included about 20 programs of different kinds and topics produced and broadcast by RTVE between 2015 and 2018. The programs presented different challenges from the point of view of speech technologies such as: the diversity of Spanish accents, overlapping speech, spontaneous speech, acoustic variability, background noise, or specific vocabulary. This paper describes the database and the evaluation process and summarizes the results obtained

    Multimode delivery in the classroom

    Get PDF
    Because of recent technological advances, subtitling is now easier and more versatile than in the past. There is an increasing interest in the use of digitally-recorded audiovisual materials with both soundtrack and subtitles in the same language as a language-learning aid. The full potential of this is not currently attained because of poor-quality subtitling and less appropriate “caption” or “synopsis” rather than “transcription” subtitles. An adaptation of a format successful over two decades in Europe might be of value for South-East Asian language learners

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    Archaeologies of Sound: Reconstructing Louis MacNeice’s Wartime Radio Publics

    Get PDF
    This article approaches the problem of reconstructing the culturally situated audience experience of radio programming through the example of Louis MacNeice's wartime radio broadcasts, notably "Alexander Nevsky" and "Christopher Columbus". The article draws on audience research reports, internal correspondence, and close analysis of the broadcasts themselves in order to triangulate a listening experience that, though it ultimately cannot be recovered, can be better understood through its proximate cultural traces

    An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies

    Get PDF
    Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.This work was partially supported by Radio Televisión Española through the RTVE Chair at the University of Zaragoza, and Red Temática en Tecnologías del Habla (RED2022-134270-T), funded by AEI (Ministerio de Ciencia e Innovación); It was also partially funded by the European Union’s Horizon 2020 research and innovation program under Marie Skłodowska-Curie Grant 101007666; in part by MCIN/AEI/10.13039/501100011033 and by the European Union “NextGenerationEU”/ PRTR under Grants PDC2021-120846C41 PID2021-126061OB-C44, and in part by the Government of Aragon (Grant Group T3623R); it was also partially funded by the Spanish Ministry of Science and Innovation (OPEN-SPEECH project, PID2019-106424RB-I00) and by the Basque Government under the general support program to research groups (IT-1704-22), and by projects RTI2018-098091-B-I00 and PID2021-125943OB-I00 (Spanish Ministry of Science and Innovation and ERDF) as well

    Sub-Sync: automatic synchronization of subtitles in the broadcasting of true live programs in spanish

    Get PDF
    Individuals With Sensory Impairment (Hearing Or Visual) Encounter Serious Communication Barriers Within Society And The World Around Them. These Barriers Hinder The Communication Process And Make Access To Information An Obstacle They Must Overcome On A Daily Basis. In This Context, One Of The Most Common Complaints Made By The Television (Tv) Users With Sensory Impairment Is The Lack Of Synchronism Between Audio And Subtitles In Some Types Of Programs. In Addition, Synchronization Remains One Of The Most Significant Factors In Audience Perception Of Quality In Live-Originated Tv Subtitles For The Deaf And Hard Of Hearing. This Paper Introduces The Sub-Sync Framework Intended For Use In Automatic Synchronization Of Audio-Visual Contents And Subtitles, Taking Advantage Of Current Well-Known Techniques Used In Symbol Sequences Alignment. In This Particular Case, These Symbol Sequences Are The Subtitles Produced By The Broadcaster Subtitling System And The Word Flow Generated By An Automatic Speech Recognizing The Procedure. The Goal Of Sub-Sync Is To Address The Lack Of Synchronism That Occurs In The Subtitles When Produced During The Broadcast Of Live Tv Programs Or Other Programs That Have Some Improvised Parts. Furthermore, It Also Aims To Resolve The Problematic Interphase Of Synchronized And Unsynchronized Parts Of Mixed Type Programs. In Addition, The Framework Is Able To Synchronize The Subtitles Even When They Do Not Correspond Literally To The Original Audio And/Or The Audio Cannot Be Completely Transcribed By An Automatic Process. Sub-Sync Has Been Successfully Tested In Different Live Broadcasts, Including Mixed Programs, In Which The Synchronized Parts (Recorded, Scripted) Are Interspersed With Desynchronized (Improvised) Ones
    corecore