2 research outputs found
Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization
In this paper we address speaker-independent multichannel speech enhancement
in unknown noisy environments. Our work is based on a well-established
multichannel local Gaussian modeling framework. We propose to use a neural
network for modeling the speech spectro-temporal content. The parameters of
this supervised model are learned using the framework of variational
autoencoders. The noisy recording environment is supposed to be unknown, so the
noise spectro-temporal modeling remains unsupervised and is based on
non-negative matrix factorization (NMF). We develop a Monte Carlo
expectation-maximization algorithm and we experimentally show that the proposed
approach outperforms its NMF-based counterpart, where speech is modeled using
supervised NMF.Comment: 5 pages, 2 figures, audio examples and code available online at
https://team.inria.fr/perception/icassp-2019-mvae
Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise
We consider the problem of simultaneous reduction of acoustic echo,
reverberation and noise. In real scenarios, these distortion sources may occur
simultaneously and reducing them implies combining the corresponding
distortion-specific filters. As these filters interact with each other, they
must be jointly optimized. We propose to model the target and residual signals
after linear echo cancellation and dereverberation using a multichannel
Gaussian modeling framework and to jointly represent their spectra by means of
a neural network. We develop an iterative block-coordinate ascent algorithm to
update all the filters. We evaluate our system on real recordings of acoustic
echo, reverberation and noise acquired with a smart speaker in various
situations. The proposed approach outperforms in terms of overall distortion a
cascade of the individual approaches and a joint reduction approach which does
not rely on a spectral model of the target and residual signals