Search CORE

2 research outputs found

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Author: Girin Laurent
Horaud Radu
Leglaive Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2019
Field of study

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.Comment: 5 pages, 2 figures, audio examples and code available online at https://team.inria.fr/perception/icassp-2019-mvae

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise

Author: Carbajal Guillaume
Humbert Eric
Serizel Romain
Vincent Emmanuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/11/2019
Field of study

We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise. In real scenarios, these distortion sources may occur simultaneously and reducing them implies combining the corresponding distortion-specific filters. As these filters interact with each other, they must be jointly optimized. We propose to model the target and residual signals after linear echo cancellation and dereverberation using a multichannel Gaussian modeling framework and to jointly represent their spectra by means of a neural network. We develop an iterative block-coordinate ascent algorithm to update all the filters. We evaluate our system on real recordings of acoustic echo, reverberation and noise acquired with a smart speaker in various situations. The proposed approach outperforms in terms of overall distortion a cascade of the individual approaches and a joint reduction approach which does not rely on a spectral model of the target and residual signals

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1