36 research outputs found
Automatic Segmentation of Broadcast News Audio using Self Similarity Matrix
Generally audio news broadcast on radio is com- posed of music, commercials,
news from correspondents and recorded statements in addition to the actual news
read by the newsreader. When news transcripts are available, automatic
segmentation of audio news broadcast to time align the audio with the text
transcription to build frugal speech corpora is essential. We address the
problem of identifying segmentation in the audio news broadcast corresponding
to the news read by the newsreader so that they can be mapped to the text
transcripts. The existing techniques produce sub-optimal solutions when used to
extract newsreader read segments. In this paper, we propose a new technique
which is able to identify the acoustic change points reliably using an acoustic
Self Similarity Matrix (SSM). We describe the two pass technique in detail and
verify its performance on real audio news broadcast of All India Radio for
different languages.Comment: 4 pages, 5 image
Smart sound sensor to detect the number of people in a room
International audienceAmbient sound monitoring is a widely used strategy to follow older adults, which could help them achieve healthy ageing with comfort and security. In a previous work, we have already developed a smart audio sensor able to recognize everyday life sounds in order to detect activities of daily living (ADL) and distress situations. In this paper, we propose to add a new functionality by analyzing the speech flow to detect the number of people in a room. The proposed algorithms are based on speaker diarization methods. This information can be used to better detect activities of daily life but also to know when the person is home alone. This functionality can also offer more comfort through light, heating and air conditioning adaptation to the number of people in an environment
Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates
Current research into spoken language translation (SLT),or speech-to-text
translation, is often hampered by the lack of specific data resources for this
task, as currently available SLT datasets are restricted to a limited set of
language pairs. In this paper we present Europarl-ST, a novel multilingual SLT
corpus containing paired audio-text samples for SLT from and into 6 European
languages, for a total of 30 different translation directions. This corpus has
been compiled using the debates held in the European Parliament in the period
between 2008 and 2012. This paper describes the corpus creation process and
presents a series of automatic speech recognition, machine translation and
spoken language translation experiments that highlight the potential of this
new resource. The corpus is released under a Creative Commons license and is
freely accessible and downloadable.Comment: Accepted by ICASSP2020. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
EUMSSI team at the MediaEval Person Discovery Challenge 2016
We present the results of the EUMSSI team’s participation in the Multimodal Person Discovery task. The goal is to identify all people who simultaneously appear and speak in a video corpus. In the proposed system, besides improving each modality, we emphasize on the ranking of multiple results from both audio stream and visual stream