420 research outputs found
Balancing reconstruction error and Kullback-Leibler divergence in Variational Autoencoders
In the loss function of Variational Autoencoders there is a well known
tension between two components: the reconstruction loss, improving the quality
of the resulting images, and the Kullback-Leibler divergence, acting as a
regularizer of the latent space. Correctly balancing these two components is a
delicate issue, easily resulting in poor generative behaviours. In a recent
work, Dai and Wipf obtained a sensible improvement by allowing the network to
learn the balancing factor during training, according to a suitable loss
function. In this article, we show that learning can be replaced by a simple
deterministic computation, helping to understand the underlying mechanism, and
resulting in a faster and more accurate behaviour. On typical datasets such as
Cifar and Celeba, our technique sensibly outperforms all previous VAE
architectures
Variance Loss in Variational Autoencoders
In this article, we highlight what appears to be major issue of Variational
Autoencoders, evinced from an extensive experimentation with different network
architectures and datasets: the variance of generated data is significantly
lower than that of training data. Since generative models are usually evaluated
with metrics such as the Frechet Inception Distance (FID) that compare the
distributions of (features of) real versus generated images, the variance loss
typically results in degraded scores. This problem is particularly relevant in
a two stage setting, where we use a second VAE to sample in the latent space of
the first VAE. The minor variance creates a mismatch between the actual
distribution of latent variables and those generated by the second VAE, that
hinders the beneficial effects of the second stage. Renormalizing the output of
the second VAE towards the expected normal spherical distribution, we obtain a
sudden burst in the quality of generated samples, as also testified in terms of
FID.Comment: Article accepted at the Sixth International Conference on Machine
Learning, Optimization, and Data Science. July 19-23, 2020 - Certosa di
Pontignano, Siena, Ital
Sparsity in Variational Autoencoders
Working in high-dimensional latent spaces, the internal encoding of data in
Variational Autoencoders becomes naturally sparse. We discuss this known but
controversial phenomenon sometimes refereed to as overpruning, to emphasize the
under-use of the model capacity. In fact, it is an important form of
self-regularization, with all the typical benefits associated with sparsity: it
forces the model to focus on the really important features, highly reducing the
risk of overfitting. Especially, it is a major methodological guide for the
correct tuning of the model capacity, progressively augmenting it to attain
sparsity, or conversely reducing the dimension of the network removing links to
zeroed out neurons. The degree of sparsity crucially depends on the network
architecture: for instance, convolutional networks typically show less
sparsity, likely due to the tighter relation of features to different spatial
regions of the input.Comment: An Extended Abstract of this survey will be presented at the 1st
International Conference on Advances in Signal Processing and Artificial
Intelligence (ASPAI' 2019), 20-22 March 2019, Barcelona, Spai
Variational Autoencoders and the Variable Collapse Phenomenon
In Variational Autoencoders, when working in high-dimensional latent spaces, there is a natural collapse of latent variables with minor significance, that get altogether neglected by the generator. We discuss this known but controversial phenomenon, sometimes referred to as overpruning, to emphasize the under-use of the model capacity. In fact, it is an important form of self-regularization, with all the typical benefits associated with sparsity: it forces the model to focus on the really important features, enhancing their disentanglement and reducing the risk of overfitting. In this article, we discuss the issue, surveying past works, and particularly focusing on the exploitation of the variable collapse phenomenon as a methodological guideline for the correct tuning of the model capacity, and of the loss function parameters
Constraining Variational Inference with Geometric Jensen-Shannon Divergence.
We examine the problem of controlling divergences for latent space
regularisation in variational autoencoders. Specifically, when aiming to
reconstruct example via latent space
(), while balancing this against the need for generalisable latent
representations. We present a regularisation mechanism based on the
skew-geometric Jensen-Shannon divergence
. We find a variation in
, motivated by limiting cases, which leads
to an intuitive interpolation between forward and reverse KL in the space of
both distributions and divergences. We motivate its potential benefits for VAEs
through low-dimensional examples, before presenting quantitative and
qualitative results. Our experiments demonstrate that skewing our variant of
, in the context of
-VAEs, leads to better reconstruction and
generation when compared to several baseline VAEs. Our approach is entirely
unsupervised and utilises only one hyperparameter which can be easily
interpreted in latent space.Comment: Camera-ready version, accepted at NeurIPS 202
Combining Variational Autoencoders and Physical Bias for Improved Microscopy Data Analysis
Electron and scanning probe microscopy produce vast amounts of data in the
form of images or hyperspectral data, such as EELS or 4D STEM, that contain
information on a wide range of structural, physical, and chemical properties of
materials. To extract valuable insights from these data, it is crucial to
identify physically separate regions in the data, such as phases, ferroic
variants, and boundaries between them. In order to derive an easily
interpretable feature analysis, combining with well-defined boundaries in a
principled and unsupervised manner, here we present a physics augmented machine
learning method which combines the capability of Variational Autoencoders to
disentangle factors of variability within the data and the physics driven loss
function that seeks to minimize the total length of the discontinuities in
images corresponding to latent representations. Our method is applied to
various materials, including NiO-LSMO, BiFeO3, and graphene. The results
demonstrate the effectiveness of our approach in extracting meaningful
information from large volumes of imaging data. The fully notebook containing
implementation of the code and analysis workflow is available at
https://github.com/arpanbiswas52/PaperNotebooksComment: 20 pages, 7 figures in main text, 4 figures in Supp Ma
Notes on the use of variational autoencoders for speech and audio spectrogram modeling
International audienceVariational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization
- …