Semi-supervised multichannel speech enhancement with variational
  autoencoders and non-negative matrix factorization

Girin, Laurent; Horaud, Radu; Leglaive, Simon

slides

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Authors: Laurent Girin
Radu Horaud
Simon Leglaive
Publication date: 30 April 2019
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.Comment: 5 pages, 2 figures, audio examples and code available online at https://team.inria.fr/perception/icassp-2019-mvae