Search CORE

36,694 research outputs found

Detection and Classification of Acoustic Scenes and Events

Author: Benetos E
Giannoulis D
Lagrange M
Plumbley MD
Rossignol M
Stowell D
Publication venue
Publication date: 30/12/2013
Field of study

The Sheffield Wargames Corpus.

Author: Fox C.W.
Hain T.
Liu Y.
Zwyssig E.
Publication venue
Publication date: 01/01/2013
Field of study

Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between sev-eral speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high qual-ity near and far field data from real interactions between par-ticipants. In this paper we present a new corpus for research on speech recognition, speaker tracking and diarisation, based on recordings of native speakers of English playing a table-top wargame. The Sheffield Wargames Corpus comprises 7 hours of data from 10 recording sessions, obtained from 96 micro-phones, 3 video cameras and, most importantly, 3D location data provided by a sensor tracking system. The corpus repre-sents a unique resource, that provides for the first time location tracks (1.3Hz) of speakers that are constantly moving and talk-ing. The corpus is available for research purposes, and includes annotated development and evaluation test sets. Baseline results for close-talking and far field sets are included in this paper. 1

CiteSeerX

Edinburgh Research Explorer

White Rose Research Online

Foreground-Background Ambient Sound Scene Separation

Author: Gasso Gilles
Olvera Michel
Serizel Romain
Vincent Emmanuel
Publication venue
Publication date: 27/07/2020
Field of study

Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separation. We propose a deep learning-based separation framework with a suitable feature normaliza-tion scheme and an optional auxiliary network capturing the background statistics, and we investigate its ability to handle the great variety of sound classes encountered in ambient sound scenes, which have often not been seen in training. To do so, we create single-channel foreground-background mixtures using isolated sounds from the DESED and Audioset datasets, and we conduct extensive experiments with mixtures of seen or unseen sound classes at various signal-to-noise ratios. Our experimental findings demonstrate the generalization ability of the proposed approach

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

INRIA a CCSD electronic archive server

Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation

Author: Erdogan Hakan
Erdoğan Hakan
Grais Emad Mounir
Publication venue: ISCA (International Speech Communication Association)
Publication date: 01/01/2013
Field of study

We propose to use minimum mean squared error (MMSE) estimates to enhance the signals that are separated by nonnegative matrix factorization (NMF). In single channel source separation (SCSS), NMF is used to train a set of basis vectors for each source from their training spectrograms. Then NMF is used to decompose the mixed signal spectrogram as a weighted linear combination of the trained basis vectors from which estimates of each corresponding source can be obtained. In this work, we deal with the spectrogram of each separated signal as a 2D distorted signal that needs to be restored. A multiplicative distortion model is assumed where the logarithm of the true signal distribution is modeled with a Gaussian mixture model (GMM) and the distortion is modeled as having a log-normal distribution. The parameters of the GMM are learned from training data whereas the distortion parameters are learned online from each separated signal. The initial source estimates are improved and replaced with their MMSE estimates under this new probabilistic framework. The experimental results show that using the proposed MMSE estimation technique as a post enhancement after NMF improves the quality of the separated signal

CiteSeerX

Sabanci University Research Database

Probabilistic Modeling Paradigms for Audio Source Separation

Author: A. P.Dempster
A.Gelman
D. L.Wang
D.FitzGerald
J.Nocedal
J.Winn
M. I.Mandel
R. J.Weiss
R.Mukai
S. T.Roweis
S.Makino
Publication venue: 'IGI Global'
Publication date: 01/01/2010
Field of study

This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Queen Mary Research Online

Surrey Research Insight

HAL-Rennes 1

Recommended from our members

Monaural speech separation with deep learning using phase modelling and capsule networks

Author: dubey
jansson
kingma
lalonde
muth
raffel
ronneberger
sabour
trabelsi
xi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2019
Field of study

The removal of background noise from speech audio is a problem with high practical relevance. A variety of deep learning approaches have been applied to it in recent years, most of which operate on a magnitude spectrogram representation of a noisy recording to estimate the isolated speaking voice. This work investigates ways to include phase information, which is commonly discarded, firstly within a convolutional neural network (CNN) architecture, and secondly by applying capsule networks, to our knowledge the first time capsules have been used in source separation. We present a Circular Loss function, which takes into account the periodic nature of phase. Our results show that the inclusion of phase information leads to an improvement in the quality of speech separation. We also find that in our experiments convolutional neural networks outperform capsule networks at speech separation

City Research Online

Crossref