73 research outputs found
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
Recent studies have explored the use of deep generative models of speech
spectra based of variational autoencoders (VAEs), combined with unsupervised
noise models, to perform speech enhancement. These studies developed iterative
algorithms involving either Gibbs sampling or gradient descent at each step,
making them computationally expensive. This paper proposes a variational
inference method to iteratively estimate the power spectrogram of the clean
speech. Our main contribution is the analytical derivation of the variational
steps in which the en-coder of the pre-learned VAE can be used to estimate the
varia-tional approximation of the true posterior distribution, using the very
same assumption made to train VAEs. Experiments show that the proposed method
produces results on par with the afore-mentioned iterative methods using
sampling, while decreasing the computational cost by a factor 36 to reach a
given performance .Comment: Submitted to INTERSPEECH 201
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document
Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the encoder of the pre-learned VAE can be used to estimate the variational approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the aforementioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance
Fast and efficient speech enhancement with variational autoencoders
Unsupervised speech enhancement based on variational autoencoders has shown
promising performance compared with the commonly used supervised methods. This
approach involves the use of a pre-trained deep speech prior along with a
parametric noise model, where the noise parameters are learned from the noisy
speech signal with an expectationmaximization (EM)-based method. The E-step
involves an intractable latent posterior distribution. Existing algorithms to
solve this step are either based on computationally heavy Monte Carlo Markov
Chain sampling methods and variational inference, or inefficient
optimization-based methods. In this paper, we propose a new approach based on
Langevin dynamics that generates multiple sequences of samples and comes with a
total variation-based regularization to incorporate temporal correlations of
latent vectors. Our experiments demonstrate that the developed framework makes
an effective compromise between computational efficiency and enhancement
quality, and outperforms existing methods
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
- …