3,555 research outputs found

    Disentangled Non-Local Neural Networks

    Full text link
    The non-local block is a popular module for strengthening the context modeling ability of a regular convolutional neural network. This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel. We also observe that the two terms trained alone tend to model different visual clues, e.g. the whitened pairwise term learns within-region relationships while the unary term learns salient boundaries. However, the two terms are tightly coupled in the non-local block, which hinders the learning of each. Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. We demonstrate the effectiveness of the decoupled design on various tasks, such as semantic segmentation on Cityscapes, ADE20K and PASCAL Context, object detection on COCO, and action recognition on Kinetics

    Leveraging Latent Features for Local Explanations

    Full text link
    As the application of deep neural networks proliferates in numerous areas such as medical imaging, video surveillance, and self driving cars, the need for explaining the decisions of these models has become a hot research topic, both at the global and local level. Locally, most explanation methods have focused on identifying relevance of features, limiting the types of explanations possible. In this paper, we investigate a new direction by leveraging latent features to generate contrastive explanations; predictions are explained not only by highlighting aspects that are in themselves sufficient to justify the classification, but also by new aspects which if added will change the classification. The key contribution of this paper lies in how we add features to rich data in a formal yet humanly interpretable way that leads to meaningful results. Our new definition of "addition" uses latent features to move beyond the limitations of previous explanations and resolve an open question laid out in Dhurandhar, et. al. (2018), which creates local contrastive explanations but is limited to simple datasets such as grayscale images. The strength of our approach in creating intuitive explanations that are also quantitatively superior to other methods is demonstrated on three diverse image datasets (skin lesions, faces, and fashion apparel). A user study with 200 participants further exemplifies the benefits of contrastive information, which can be viewed as complementary to other state-of-the-art interpretability methods.Comment: Accepted to KDD 202

    Geometry of Deep Generative Models for Disentangled Representations

    Full text link
    Deep generative models like variational autoencoders approximate the intrinsic geometry of high dimensional data manifolds by learning low-dimensional latent-space variables and an embedding function. The geometric properties of these latent spaces has been studied under the lens of Riemannian geometry; via analysis of the non-linearity of the generator function. In new developments, deep generative models have been used for learning semantically meaningful `disentangled' representations; that capture task relevant attributes while being invariant to other attributes. In this work, we explore the geometry of popular generative models for disentangled representation learning. We use several metrics to compare the properties of latent spaces of disentangled representation models in terms of class separability and curvature of the latent-space. The results we obtain establish that the class distinguishable features in the disentangled latent space exhibits higher curvature as opposed to a variational autoencoder. We evaluate and compare the geometry of three such models with variational autoencoder on two different datasets. Further, our results show that distances and interpolation in the latent space are significantly improved with Riemannian metrics derived from the curvature of the space. We expect these results will have implications on understanding how deep-networks can be made more robust, generalizable, as well as interpretable.Comment: Accepted at ICVGIP, 201

    ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes

    Full text link
    Recent studies on face attribute transfer have achieved great success. A lot of models are able to transfer face attributes with an input image. However, they suffer from three limitations: (1) incapability of generating image by exemplars; (2) being unable to transfer multiple face attributes simultaneously; (3) low quality of generated images, such as low-resolution or artifacts. To address these limitations, we propose a novel model which receives two images of opposite attributes as inputs. Our model can transfer exactly the same type of attributes from one image to another by exchanging certain part of their encodings. All the attributes are encoded in a disentangled manner in the latent space, which enables us to manipulate several attributes simultaneously. Besides, our model learns the residual images so as to facilitate training on higher resolution images. With the help of multi-scale discriminators for adversarial training, it can even generate high-quality images with finer details and less artifacts. We demonstrate the effectiveness of our model on overcoming the above three limitations by comparing with other methods on the CelebA face database. A pytorch implementation is available at https://github.com/Prinsphield/ELEGANT.Comment: Github: https://github.com/Prinsphield/ELEGAN

    Variational Inference of Disentangled Latent Concepts from Unlabeled Observations

    Full text link
    Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We also propose a new disentanglement metric which is better aligned with the qualitative disentanglement observed in the decoder's output. We empirically observe significant improvement over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).Comment: ICLR 2018 Versio

    Counterfactuals uncover the modular structure of deep generative models

    Full text link
    Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the data space remains challenging without some form of supervision. While previous work has focused on exploiting statistical independence to disentangle latent factors, we argue that such requirement is too restrictive and propose instead a non-statistical framework that relies on counterfactual manipulations to uncover a modular structure of the network composed of disentangled groups of internal variables. Experiments with a variety of generative models trained on complex image datasets show the obtained modules can be used to design targeted interventions. This opens the way to applications such as computationally efficient style transfer and the automated assessment of robustness to contextual changes in pattern recognition systems.Comment: 26 pages, 17 figure

    Learning Hierarchical Features from Generative Models

    Full text link
    Deep neural networks have been shown to be very successful at learning feature hierarchies in supervised learning tasks. Generative models, on the other hand, have benefited less from hierarchical models with multiple layers of latent variables. In this paper, we prove that hierarchical latent variable models do not take advantage of the hierarchical structure when trained with existing variational methods, and provide some limitations on the kind of features existing models can learn. Finally we propose an alternative architecture that do not suffer from these limitations. Our model is able to learn highly interpretable and disentangled hierarchical features on several natural image datasets with no task specific regularization or prior knowledge.Comment: ICML'201

    LOGAN: Unpaired Shape Transform in Latent Overcomplete Space

    Full text link
    We introduce LOGAN, a deep neural network aimed at learning general-purpose shape transforms from unpaired domains. The network is trained on two sets of shapes, e.g., tables and chairs, while there is neither a pairing between shapes from the domains as supervision nor any point-wise correspondence between any shapes. Once trained, LOGAN takes a shape from one domain and transforms it into the other. Our network consists of an autoencoder to encode shapes from the two input domains into a common latent space, where the latent codes concatenate multi-scale shape features, resulting in an overcomplete representation. The translator is based on a generative adversarial network (GAN), operating in the latent space, where an adversarial loss enforces cross-domain translation while a feature preservation loss ensures that the right shape features are preserved for a natural shape transform. We conduct ablation studies to validate each of our key network designs and demonstrate superior capabilities in unpaired shape transforms on a variety of examples over baselines and state-of-the-art approaches. We show that LOGAN is able to learn what shape features to preserve during shape translation, either local or non-local, whether content or style, depending solely on the input domains for training.Comment: Download supplementary material here -> https://kangxue.org/papers/logan_supp.pd

    Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies

    Full text link
    Intelligent behaviour in the real-world requires the ability to acquire new knowledge from an ongoing sequence of experiences while preserving and reusing past knowledge. We propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE). Based on the Minimum Description Length principle, VASE automatically detects shifts in the data distribution and allocates spare representational capacity to new knowledge, while simultaneously protecting previously learnt representations from catastrophic forgetting. Our approach encourages the learnt representations to be disentangled, which imparts a number of desirable properties: VASE can deal sensibly with ambiguous inputs, it can enhance its own representations through imagination-based exploration, and most importantly, it exhibits semantically meaningful sharing of latents between different datasets. Compared to baselines with entangled representations, our approach is able to reason beyond surface-level statistics and perform semantically meaningful cross-domain inference

    TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation

    Full text link
    Unsupervised image-to-image translation aims at learning a mapping between two visual domains. However, learning a translation across large geometry variations always ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task. Instead of learning the mapping on the image space directly, we disentangle image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first introduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but complementary representations. The translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects translation tasks. In addition, by taking different exemplars as the appearance references, our method also supports multimodal translation. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.htm
    corecore