Search CORE

582 research outputs found

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

Author: Deleforge Antoine
Pariente Manuel
Vincent Emmanuel
Publication venue
Publication date: 14/05/2019
Field of study

Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the en-coder of the pre-learned VAE can be used to estimate the varia-tional approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the afore-mentioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance .Comment: Submitted to INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Author: Duong Ngoc
Essid Slim
Ozerov Alexey
Parekh Sanjeel
Pérez Patrick
Richard Gaël
Publication venue
Publication date: 07/11/2018
Field of study

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results

arXiv.org e-Print Archive

HAL-Rennes 1

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

Author: Girin Laurent
Horaud Radu
Leglaive Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2019
Field of study

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.Comment: 5 pages, 2 figures, audio examples and code available online at https://team.inria.fr/perception/icassp-2019-mvae

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Author: Christensen Mads Græsbøll
Højvang Jesper Lisby
Rasmussen Morten Højfeldt
Shi Liming
Xiang Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2022
Field of study

VBN