5 research outputs found
Geometry of Deep Generative Models for Disentangled Representations
Deep generative models like variational autoencoders approximate the
intrinsic geometry of high dimensional data manifolds by learning
low-dimensional latent-space variables and an embedding function. The geometric
properties of these latent spaces has been studied under the lens of Riemannian
geometry; via analysis of the non-linearity of the generator function. In new
developments, deep generative models have been used for learning semantically
meaningful `disentangled' representations; that capture task relevant
attributes while being invariant to other attributes. In this work, we explore
the geometry of popular generative models for disentangled representation
learning. We use several metrics to compare the properties of latent spaces of
disentangled representation models in terms of class separability and curvature
of the latent-space. The results we obtain establish that the class
distinguishable features in the disentangled latent space exhibits higher
curvature as opposed to a variational autoencoder. We evaluate and compare the
geometry of three such models with variational autoencoder on two different
datasets. Further, our results show that distances and interpolation in the
latent space are significantly improved with Riemannian metrics derived from
the curvature of the space. We expect these results will have implications on
understanding how deep-networks can be made more robust, generalizable, as well
as interpretable.Comment: Accepted at ICVGIP, 201
Disentangling Multiple Features in Video Sequences using Gaussian Processes in Variational Autoencoders
We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes
Variational AutoEncoder), a variational autoencoder which uses Gaussian
processes (GP) to model the latent space for the unsupervised learning of
disentangled representations in video sequences. We improve upon previous work
by establishing a framework by which multiple features, static or dynamic, can
be disentangled. Specifically we use fractional Brownian motions (fBM) and
Brownian bridges (BB) to enforce an inter-frame correlation structure in each
independent channel, and show that varying this structure enables one to
capture different factors of variation in the data. We demonstrate the quality
of our representations with experiments on three publicly available datasets,
and also quantify the improvement using a video prediction task. Moreover, we
introduce a novel geodesic loss function which takes into account the curvature
of the data manifold to improve learning. Our experiments show that the
combination of the improved representations with the novel loss function enable
MGP-VAE to outperform the baselines in video prediction
Product of Orthogonal Spheres Parameterization for Disentangled Representation Learning
Learning representations that can disentangle explanatory attributes
underlying the data improves interpretabilty as well as provides control on
data generation. Various learning frameworks such as VAEs, GANs and
auto-encoders have been used in the literature to learn such representations.
Most often, the latent space is constrained to a partitioned representation or
structured by a prior to impose disentangling. In this work, we advance the use
of a latent representation based on a product space of Orthogonal Spheres
PrOSe. The PrOSe model is motivated by the reasoning that latent-variables
related to the physics of image-formation can under certain relaxed assumptions
lead to spherical-spaces. Orthogonality between the spheres is motivated via
physical independence models. Imposing the orthogonal-sphere constraint is much
simpler than other complicated physical models, is fairly general and flexible,
and extensible beyond the factors used to motivate its development. Under
further relaxed assumptions of equal-sized latent blocks per factor, the
constraint can be written down in closed form as an ortho-normality term in the
loss function. We show that our approach improves the quality of
disentanglement significantly. We find consistent improvement in
disentanglement compared to several state-of-the-art approaches, across several
benchmarks and metrics.Comment: Accepted at British Machine Vision Conference (BMVC) 201
Mixing Consistent Deep Clustering
Finding well-defined clusters in data represents a fundamental challenge for
many data-driven applications, and largely depends on good data representation.
Drawing on literature regarding representation learning, studies suggest that
one key characteristic of good latent representations is the ability to produce
semantically mixed outputs when decoding linear interpolations of two latent
representations. We propose the Mixing Consistent Deep Clustering method which
encourages interpolations to appear realistic while adding the constraint that
interpolations of two data points must look like one of the two inputs. By
applying this training method to various clustering (non-)specific autoencoder
models we found that using the proposed training method systematically changed
the structure of learned representations of a model and it improved clustering
performance for the tested ACAI, IDEC, and VAE models on the MNIST, SVHN, and
CIFAR-10 datasets. These outcomes have practical implications for numerous
real-world clustering tasks, as it shows that the proposed method can be added
to existing autoencoders to further improve clustering performance
Dimensionality compression and expansion in Deep Neural Networks
Datasets such as images, text, or movies are embedded in high-dimensional
spaces. However, in important cases such as images of objects, the statistical
structure in the data constrains samples to a manifold of dramatically lower
dimensionality. Learning to identify and extract task-relevant variables from
this embedded manifold is crucial when dealing with high-dimensional problems.
We find that neural networks are often very effective at solving this task and
investigate why. To this end, we apply state-of-the-art techniques for
intrinsic dimensionality estimation to show that neural networks learn
low-dimensional manifolds in two phases: first, dimensionality expansion driven
by feature generation in initial layers, and second, dimensionality compression
driven by the selection of task-relevant features in later layers. We model
noise generated by Stochastic Gradient Descent and show how this noise balances
the dimensionality of neural representations by inducing an effective
regularization term in the loss. We highlight the important relationship
between low-dimensional compressed representations and generalization
properties of the network. Our work contributes by shedding light on the
success of deep neural networks in disentangling data in high-dimensional space
while achieving good generalization. Furthermore, it invites new learning
strategies focused on optimizing measurable geometric properties of learned
representations, beginning with their intrinsic dimensionality.Comment: Submitted to NeurIPS 2019. First two authors contributed equall