Search CORE

36 research outputs found

Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix

Author: Imran Ahmed
Kopparapu Sunil Kumar
Soni Sapna
Publication venue
Publication date: 26/03/2014
Field of study

Generally audio news broadcast on radio is com- posed of music, commercials, news from correspondents and recorded statements in addition to the actual news read by the newsreader. When news transcripts are available, automatic segmentation of audio news broadcast to time align the audio with the text transcription to build frugal speech corpora is essential. We address the problem of identifying segmentation in the audio news broadcast corresponding to the news read by the newsreader so that they can be mapped to the text transcripts. The existing techniques produce sub-optimal solutions when used to extract newsreader read segments. In this paper, we propose a new technique which is able to identify the acoustic change points reliably using an acoustic Self Similarity Matrix (SSM). We describe the two pass technique in detail and verify its performance on real audio news broadcast of All India Radio for different languages.Comment: 4 pages, 5 image

arXiv.org e-Print Archive

Crossref

Smart sound sensor to detect the number of people in a room

Author: Boudy Jerome
Boutamine Sami
Dan Istrate
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/07/2019
Field of study

International audienceAmbient sound monitoring is a widely used strategy to follow older adults, which could help them achieve healthy ageing with comfort and security. In a previous work, we have already developed a smart audio sensor able to recognize everyday life sounds in order to detect activities of daily living (ADL) and distress situations. In this paper, we propose to add a new functionality by analyzing the speech flow to detect the number of people in a room. The proposed algorithms are based on speaker diarization methods. This information can be used to better detect activities of daily life but also to know when the person is home alone. This functionality can also offer more comfort through light, heating and air conditioning adaptation to the number of people in an environment

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates

Author: Civera Jorge
Giménez Adrià
Iranzo-Sánchez Javier
Jorge Javier
Juan Alfons
Roselló Nahuel
Sanchis Albert
Silvestre-Cerdà Joan Albert
Publication venue
Publication date: 12/02/2020
Field of study

Current research into spoken language translation (SLT),or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition, machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.Comment: Accepted by ICASSP2020. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

arXiv.org e-Print Archive

RiuNet

Open Source Toolkit for Speech to Text Translation

Author: Müller Markus
Niehues Jan
Pham Ngoc-Quan
Sperber Matthias
Stüker Sebastian
Waibel Alex
Zenkel Thomas
Publication venue: Charles University
Publication date: 20/04/2022
Field of study

KITopen

EUMSSI team at the MediaEval Person Discovery Challenge 2016

Author: Le Nam
Meignier Sylvain
Odobez Jean-Marc
Publication venue
Publication date: 19/11/2016
Field of study

We present the results of the EUMSSI team’s participation in the Multimodal Person Discovery task. The goal is to identify all people who simultaneously appear and speak in a video corpus. In the proposed system, besides improving each modality, we emphasize on the ranking of multiple results from both audio stream and visual stream

Infoscience - École polytechnique fédérale de Lausanne

KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS

Author: Bourlard Hervé
Madikeri Srikanth
Publication venue: Idiap
Publication date: 19/06/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION

Author: Bourlard Hervé
Madikeri Srikanth
Motlicek Petr
Publication venue: Idiap
Publication date: 19/06/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref