7,846 research outputs found
Dynamic Variational Autoencoders for Visual Process Modeling
This work studies the problem of modeling visual processes by leveraging deep
generative architectures for learning linear, Gaussian representations from
observed sequences. We propose a joint learning framework, combining a vector
autoregressive model and Variational Autoencoders. This results in an
architecture that allows Variational Autoencoders to simultaneously learn a
non-linear observation as well as a linear state model from sequences of
frames. We validate our approach on artificial sequences and dynamic textures
Sparsity in Variational Autoencoders
Working in high-dimensional latent spaces, the internal encoding of data in
Variational Autoencoders becomes naturally sparse. We discuss this known but
controversial phenomenon sometimes refereed to as overpruning, to emphasize the
under-use of the model capacity. In fact, it is an important form of
self-regularization, with all the typical benefits associated with sparsity: it
forces the model to focus on the really important features, highly reducing the
risk of overfitting. Especially, it is a major methodological guide for the
correct tuning of the model capacity, progressively augmenting it to attain
sparsity, or conversely reducing the dimension of the network removing links to
zeroed out neurons. The degree of sparsity crucially depends on the network
architecture: for instance, convolutional networks typically show less
sparsity, likely due to the tighter relation of features to different spatial
regions of the input.Comment: An Extended Abstract of this survey will be presented at the 1st
International Conference on Advances in Signal Processing and Artificial
Intelligence (ASPAI' 2019), 20-22 March 2019, Barcelona, Spai
Subitizing with Variational Autoencoders
Numerosity, the number of objects in a set, is a basic property of a given
visual scene. Many animals develop the perceptual ability to subitize: the
near-instantaneous identification of the numerosity in small sets of visual
items. In computer vision, it has been shown that numerosity emerges as a
statistical property in neural networks during unsupervised learning from
simple synthetic images. In this work, we focus on more complex natural images
using unsupervised hierarchical neural networks. Specifically, we show that
variational autoencoders are able to spontaneously perform subitizing after
training without supervision on a large amount images from the Salient Object
Subitizing dataset. While our method is unable to outperform supervised
convolutional networks for subitizing, we observe that the networks learn to
encode numerosity as basic visual property. Moreover, we find that the learned
representations are likely invariant to object area; an observation in
alignment with studies on biological neural networks in cognitive neuroscience
Resampled Priors for Variational Autoencoders
We propose Learned Accept/Reject Sampling (LARS), a method for constructing
richer priors using rejection sampling with a learned acceptance function. This
work is motivated by recent analyses of the VAE objective, which pointed out
that commonly used simple priors can lead to underfitting. As the distribution
induced by LARS involves an intractable normalizing constant, we show how to
estimate it and its gradients efficiently. We demonstrate that LARS priors
improve VAE performance on several standard datasets both when they are learned
jointly with the rest of the model and when they are fitted to a pretrained
model. Finally, we show that LARS can be combined with existing methods for
defining flexible priors for an additional boost in performance
Auxiliary Guided Autoregressive Variational Autoencoders
Generative modeling of high-dimensional data is a key problem in machine
learning. Successful approaches include latent variable models and
autoregressive models. The complementary strengths of these approaches, to
model global and local image statistics respectively, suggest hybrid models
that encode global image structure into latent variables while autoregressively
modeling low level detail. Previous approaches to such hybrid models restrict
the capacity of the autoregressive decoder to prevent degenerate models that
ignore the latent variables and only rely on autoregressive modeling. Our
contribution is a training procedure relying on an auxiliary loss function that
controls which information is captured by the latent variables and what is left
to the autoregressive decoder. Our approach can leverage arbitrarily powerful
autoregressive decoders, achieves state-of-the art quantitative performance
among models with latent variables, and generates qualitatively convincing
samples.Comment: Published as a conference paper at ECML-PKDD 201
Towards Visually Explaining Variational Autoencoders
Recent advances in Convolutional Neural Network (CNN) model interpretability
have led to impressive progress in visualizing and understanding model
predictions. In particular, gradient-based visual attention methods have driven
much recent effort in using visual attention maps as a means for visual
explanations. A key problem, however, is these methods are designed for
classification and categorization tasks, and their extension to explaining
generative models, e.g. variational autoencoders (VAE) is not trivial. In this
work, we take a step towards bridging this crucial gap, proposing the first
technique to visually explain VAEs by means of gradient-based attention. We
present methods to generate visual attention from the learned latent space, and
also demonstrate such attention explanations serve more than just explaining
VAE predictions. We show how these attention maps can be used to localize
anomalies in images, demonstrating state-of-the-art performance on the MVTec-AD
dataset. We also show how they can be infused into model training, helping
bootstrap the VAE into learning improved latent space disentanglement,
demonstrated on the Dsprites dataset
- …