3,555 research outputs found
Disentangled Non-Local Neural Networks
The non-local block is a popular module for strengthening the context
modeling ability of a regular convolutional neural network. This paper first
studies the non-local block in depth, where we find that its attention
computation can be split into two terms, a whitened pairwise term accounting
for the relationship between two pixels and a unary term representing the
saliency of every pixel. We also observe that the two terms trained alone tend
to model different visual clues, e.g. the whitened pairwise term learns
within-region relationships while the unary term learns salient boundaries.
However, the two terms are tightly coupled in the non-local block, which
hinders the learning of each. Based on these findings, we present the
disentangled non-local block, where the two terms are decoupled to facilitate
learning for both terms. We demonstrate the effectiveness of the decoupled
design on various tasks, such as semantic segmentation on Cityscapes, ADE20K
and PASCAL Context, object detection on COCO, and action recognition on
Kinetics
Leveraging Latent Features for Local Explanations
As the application of deep neural networks proliferates in numerous areas
such as medical imaging, video surveillance, and self driving cars, the need
for explaining the decisions of these models has become a hot research topic,
both at the global and local level. Locally, most explanation methods have
focused on identifying relevance of features, limiting the types of
explanations possible. In this paper, we investigate a new direction by
leveraging latent features to generate contrastive explanations; predictions
are explained not only by highlighting aspects that are in themselves
sufficient to justify the classification, but also by new aspects which if
added will change the classification. The key contribution of this paper lies
in how we add features to rich data in a formal yet humanly interpretable way
that leads to meaningful results. Our new definition of "addition" uses latent
features to move beyond the limitations of previous explanations and resolve an
open question laid out in Dhurandhar, et. al. (2018), which creates local
contrastive explanations but is limited to simple datasets such as grayscale
images. The strength of our approach in creating intuitive explanations that
are also quantitatively superior to other methods is demonstrated on three
diverse image datasets (skin lesions, faces, and fashion apparel). A user study
with 200 participants further exemplifies the benefits of contrastive
information, which can be viewed as complementary to other state-of-the-art
interpretability methods.Comment: Accepted to KDD 202
Geometry of Deep Generative Models for Disentangled Representations
Deep generative models like variational autoencoders approximate the
intrinsic geometry of high dimensional data manifolds by learning
low-dimensional latent-space variables and an embedding function. The geometric
properties of these latent spaces has been studied under the lens of Riemannian
geometry; via analysis of the non-linearity of the generator function. In new
developments, deep generative models have been used for learning semantically
meaningful `disentangled' representations; that capture task relevant
attributes while being invariant to other attributes. In this work, we explore
the geometry of popular generative models for disentangled representation
learning. We use several metrics to compare the properties of latent spaces of
disentangled representation models in terms of class separability and curvature
of the latent-space. The results we obtain establish that the class
distinguishable features in the disentangled latent space exhibits higher
curvature as opposed to a variational autoencoder. We evaluate and compare the
geometry of three such models with variational autoencoder on two different
datasets. Further, our results show that distances and interpolation in the
latent space are significantly improved with Riemannian metrics derived from
the curvature of the space. We expect these results will have implications on
understanding how deep-networks can be made more robust, generalizable, as well
as interpretable.Comment: Accepted at ICVGIP, 201
ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes
Recent studies on face attribute transfer have achieved great success. A lot
of models are able to transfer face attributes with an input image. However,
they suffer from three limitations: (1) incapability of generating image by
exemplars; (2) being unable to transfer multiple face attributes
simultaneously; (3) low quality of generated images, such as low-resolution or
artifacts. To address these limitations, we propose a novel model which
receives two images of opposite attributes as inputs. Our model can transfer
exactly the same type of attributes from one image to another by exchanging
certain part of their encodings. All the attributes are encoded in a
disentangled manner in the latent space, which enables us to manipulate several
attributes simultaneously. Besides, our model learns the residual images so as
to facilitate training on higher resolution images. With the help of
multi-scale discriminators for adversarial training, it can even generate
high-quality images with finer details and less artifacts. We demonstrate the
effectiveness of our model on overcoming the above three limitations by
comparing with other methods on the CelebA face database. A pytorch
implementation is available at https://github.com/Prinsphield/ELEGANT.Comment: Github: https://github.com/Prinsphield/ELEGAN
Variational Inference of Disentangled Latent Concepts from Unlabeled Observations
Disentangled representations, where the higher level data generative factors
are reflected in disjoint latent dimensions, offer several benefits such as
ease of deriving invariant representations, transferability to other tasks,
interpretability, etc. We consider the problem of unsupervised learning of
disentangled representations from large pool of unlabeled observations, and
propose a variational inference based approach to infer disentangled latent
factors. We introduce a regularizer on the expectation of the approximate
posterior over observed data that encourages the disentanglement. We also
propose a new disentanglement metric which is better aligned with the
qualitative disentanglement observed in the decoder's output. We empirically
observe significant improvement over existing methods in terms of both
disentanglement and data likelihood (reconstruction quality).Comment: ICLR 2018 Versio
Counterfactuals uncover the modular structure of deep generative models
Deep generative models can emulate the perceptual properties of complex image
datasets, providing a latent representation of the data. However, manipulating
such representation to perform meaningful and controllable transformations in
the data space remains challenging without some form of supervision. While
previous work has focused on exploiting statistical independence to disentangle
latent factors, we argue that such requirement is too restrictive and propose
instead a non-statistical framework that relies on counterfactual manipulations
to uncover a modular structure of the network composed of disentangled groups
of internal variables. Experiments with a variety of generative models trained
on complex image datasets show the obtained modules can be used to design
targeted interventions. This opens the way to applications such as
computationally efficient style transfer and the automated assessment of
robustness to contextual changes in pattern recognition systems.Comment: 26 pages, 17 figure
Learning Hierarchical Features from Generative Models
Deep neural networks have been shown to be very successful at learning
feature hierarchies in supervised learning tasks. Generative models, on the
other hand, have benefited less from hierarchical models with multiple layers
of latent variables. In this paper, we prove that hierarchical latent variable
models do not take advantage of the hierarchical structure when trained with
existing variational methods, and provide some limitations on the kind of
features existing models can learn. Finally we propose an alternative
architecture that do not suffer from these limitations. Our model is able to
learn highly interpretable and disentangled hierarchical features on several
natural image datasets with no task specific regularization or prior knowledge.Comment: ICML'201
LOGAN: Unpaired Shape Transform in Latent Overcomplete Space
We introduce LOGAN, a deep neural network aimed at learning general-purpose
shape transforms from unpaired domains. The network is trained on two sets of
shapes, e.g., tables and chairs, while there is neither a pairing between
shapes from the domains as supervision nor any point-wise correspondence
between any shapes. Once trained, LOGAN takes a shape from one domain and
transforms it into the other. Our network consists of an autoencoder to encode
shapes from the two input domains into a common latent space, where the latent
codes concatenate multi-scale shape features, resulting in an overcomplete
representation. The translator is based on a generative adversarial network
(GAN), operating in the latent space, where an adversarial loss enforces
cross-domain translation while a feature preservation loss ensures that the
right shape features are preserved for a natural shape transform. We conduct
ablation studies to validate each of our key network designs and demonstrate
superior capabilities in unpaired shape transforms on a variety of examples
over baselines and state-of-the-art approaches. We show that LOGAN is able to
learn what shape features to preserve during shape translation, either local or
non-local, whether content or style, depending solely on the input domains for
training.Comment: Download supplementary material here ->
https://kangxue.org/papers/logan_supp.pd
Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies
Intelligent behaviour in the real-world requires the ability to acquire new
knowledge from an ongoing sequence of experiences while preserving and reusing
past knowledge. We propose a novel algorithm for unsupervised representation
learning from piece-wise stationary visual data: Variational Autoencoder with
Shared Embeddings (VASE). Based on the Minimum Description Length principle,
VASE automatically detects shifts in the data distribution and allocates spare
representational capacity to new knowledge, while simultaneously protecting
previously learnt representations from catastrophic forgetting. Our approach
encourages the learnt representations to be disentangled, which imparts a
number of desirable properties: VASE can deal sensibly with ambiguous inputs,
it can enhance its own representations through imagination-based exploration,
and most importantly, it exhibits semantically meaningful sharing of latents
between different datasets. Compared to baselines with entangled
representations, our approach is able to reason beyond surface-level statistics
and perform semantically meaningful cross-domain inference
TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning a mapping between
two visual domains. However, learning a translation across large geometry
variations always ends up with failure. In this work, we present a novel
disentangle-and-translate framework to tackle the complex objects
image-to-image translation task. Instead of learning the mapping on the image
space directly, we disentangle image space into a Cartesian product of the
appearance and the geometry latent spaces. Specifically, we first introduce a
geometry prior loss and a conditional VAE loss to encourage the network to
learn independent but complementary representations. The translation is then
built on appearance and geometry space separately. Extensive experiments
demonstrate the superior performance of our method to other state-of-the-art
approaches, especially in the challenging near-rigid and non-rigid objects
translation tasks. In addition, by taking different exemplars as the appearance
references, our method also supports multimodal translation. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htm
- …