Search CORE

3,555 research outputs found

Disentangled Non-Local Neural Networks

Author: Cao Yue
Hu Han
Li Xiu
Lin Stephen
Yao Zhuliang
Yin Minghao
Zhang Zheng
Publication venue
Publication date: 08/09/2020
Field of study

The non-local block is a popular module for strengthening the context modeling ability of a regular convolutional neural network. This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel. We also observe that the two terms trained alone tend to model different visual clues, e.g. the whitened pairwise term learns within-region relationships while the unary term learns salient boundaries. However, the two terms are tightly coupled in the non-local block, which hinders the learning of each. Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms. We demonstrate the effectiveness of the decoupled design on various tasks, such as semantic segmentation on Cityscapes, ADE20K and PASCAL Context, object detection on COCO, and action recognition on Kinetics

arXiv.org e-Print Archive

Leveraging Latent Features for Local Explanations

Author: Chen Pin-Yu
Dhurandhar Amit
Luss Ronny
Sattigeri Prasanna
Shanmugam Karthikeyan
Tu Chun-Chen
Zhang Yunfeng
Publication venue
Publication date: 29/05/2021
Field of study

As the application of deep neural networks proliferates in numerous areas such as medical imaging, video surveillance, and self driving cars, the need for explaining the decisions of these models has become a hot research topic, both at the global and local level. Locally, most explanation methods have focused on identifying relevance of features, limiting the types of explanations possible. In this paper, we investigate a new direction by leveraging latent features to generate contrastive explanations; predictions are explained not only by highlighting aspects that are in themselves sufficient to justify the classification, but also by new aspects which if added will change the classification. The key contribution of this paper lies in how we add features to rich data in a formal yet humanly interpretable way that leads to meaningful results. Our new definition of "addition" uses latent features to move beyond the limitations of previous explanations and resolve an open question laid out in Dhurandhar, et. al. (2018), which creates local contrastive explanations but is limited to simple datasets such as grayscale images. The strength of our approach in creating intuitive explanations that are also quantitatively superior to other methods is demonstrated on three diverse image datasets (skin lesions, faces, and fashion apparel). A user study with 200 participants further exemplifies the benefits of contrastive information, which can be viewed as complementary to other state-of-the-art interpretability methods.Comment: Accepted to KDD 202

arXiv.org e-Print Archive

Geometry of Deep Generative Models for Disentangled Representations

Author: Anand Saket
Bhagat Sarthak
Shukla Ankita
Turaga Pavan
Uppal Shagun
Publication venue
Publication date: 19/02/2019
Field of study

Deep generative models like variational autoencoders approximate the intrinsic geometry of high dimensional data manifolds by learning low-dimensional latent-space variables and an embedding function. The geometric properties of these latent spaces has been studied under the lens of Riemannian geometry; via analysis of the non-linearity of the generator function. In new developments, deep generative models have been used for learning semantically meaningful `disentangled' representations; that capture task relevant attributes while being invariant to other attributes. In this work, we explore the geometry of popular generative models for disentangled representation learning. We use several metrics to compare the properties of latent spaces of disentangled representation models in terms of class separability and curvature of the latent-space. The results we obtain establish that the class distinguishable features in the disentangled latent space exhibits higher curvature as opposed to a variational autoencoder. We evaluate and compare the geometry of three such models with variational autoencoder on two different datasets. Further, our results show that distances and interpolation in the latent space are significantly improved with Riemannian metrics derived from the curvature of the space. We expect these results will have implications on understanding how deep-networks can be made more robust, generalizable, as well as interpretable.Comment: Accepted at ICVGIP, 201

arXiv.org e-Print Archive

ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes

Author: Hong Jiapeng
Ma Jinwen
Xiao Taihong
Publication venue
Publication date: 25/07/2018
Field of study

Recent studies on face attribute transfer have achieved great success. A lot of models are able to transfer face attributes with an input image. However, they suffer from three limitations: (1) incapability of generating image by exemplars; (2) being unable to transfer multiple face attributes simultaneously; (3) low quality of generated images, such as low-resolution or artifacts. To address these limitations, we propose a novel model which receives two images of opposite attributes as inputs. Our model can transfer exactly the same type of attributes from one image to another by exchanging certain part of their encodings. All the attributes are encoded in a disentangled manner in the latent space, which enables us to manipulate several attributes simultaneously. Besides, our model learns the residual images so as to facilitate training on higher resolution images. With the help of multi-scale discriminators for adversarial training, it can even generate high-quality images with finer details and less artifacts. We demonstrate the effectiveness of our model on overcoming the above three limitations by comparing with other methods on the CelebA face database. A pytorch implementation is available at https://github.com/Prinsphield/ELEGANT.Comment: Github: https://github.com/Prinsphield/ELEGAN

arXiv.org e-Print Archive

Variational Inference of Disentangled Latent Concepts from Unlabeled Observations

Author: Balakrishnan Avinash
Kumar Abhishek
Sattigeri Prasanna
Publication venue
Publication date: 27/12/2018
Field of study

Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We also propose a new disentanglement metric which is better aligned with the qualitative disentanglement observed in the decoder's output. We empirically observe significant improvement over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).Comment: ICLR 2018 Versio

arXiv.org e-Print Archive

Counterfactuals uncover the modular structure of deep generative models

Author: Besserve Michel
Mehrjou Arash
Schölkopf Bernhard
Sun Rémy
Publication venue
Publication date: 12/12/2019
Field of study

Deep generative models can emulate the perceptual properties of complex image datasets, providing a latent representation of the data. However, manipulating such representation to perform meaningful and controllable transformations in the data space remains challenging without some form of supervision. While previous work has focused on exploiting statistical independence to disentangle latent factors, we argue that such requirement is too restrictive and propose instead a non-statistical framework that relies on counterfactual manipulations to uncover a modular structure of the network composed of disentangled groups of internal variables. Experiments with a variety of generative models trained on complex image datasets show the obtained modules can be used to design targeted interventions. This opens the way to applications such as computationally efficient style transfer and the automated assessment of robustness to contextual changes in pattern recognition systems.Comment: 26 pages, 17 figure

arXiv.org e-Print Archive

Learning Hierarchical Features from Generative Models

Author: Ermon Stefano
Song Jiaming
Zhao Shengjia
Publication venue
Publication date: 09/06/2017
Field of study

Deep neural networks have been shown to be very successful at learning feature hierarchies in supervised learning tasks. Generative models, on the other hand, have benefited less from hierarchical models with multiple layers of latent variables. In this paper, we prove that hierarchical latent variable models do not take advantage of the hierarchical structure when trained with existing variational methods, and provide some limitations on the kind of features existing models can learn. Finally we propose an alternative architecture that do not suffer from these limitations. Our model is able to learn highly interpretable and disentangled hierarchical features on several natural image datasets with no task specific regularization or prior knowledge.Comment: ICML'201

arXiv.org e-Print Archive

LOGAN: Unpaired Shape Transform in Latent Overcomplete Space

Author: Chen Zhiqin
Cohen-Or Daniel
Huang Hui
Yin Kangxue
Zhang Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/09/2019
Field of study

We introduce LOGAN, a deep neural network aimed at learning general-purpose shape transforms from unpaired domains. The network is trained on two sets of shapes, e.g., tables and chairs, while there is neither a pairing between shapes from the domains as supervision nor any point-wise correspondence between any shapes. Once trained, LOGAN takes a shape from one domain and transforms it into the other. Our network consists of an autoencoder to encode shapes from the two input domains into a common latent space, where the latent codes concatenate multi-scale shape features, resulting in an overcomplete representation. The translator is based on a generative adversarial network (GAN), operating in the latent space, where an adversarial loss enforces cross-domain translation while a feature preservation loss ensures that the right shape features are preserved for a natural shape transform. We conduct ablation studies to validate each of our key network designs and demonstrate superior capabilities in unpaired shape transforms on a variety of examples over baselines and state-of-the-art approaches. We show that LOGAN is able to learn what shape features to preserve during shape translation, either local or non-local, whether content or style, depending solely on the input domains for training.Comment: Download supplementary material here -> https://kangxue.org/papers/logan_supp.pd

arXiv.org e-Print Archive

Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies

Author: Achille Alessandro
Burgess Christopher P.
Eccles Tom
Higgins Irina
Lerchner Alexander
Matthey Loic
Watters Nick
Publication venue
Publication date: 20/08/2018
Field of study

Intelligent behaviour in the real-world requires the ability to acquire new knowledge from an ongoing sequence of experiences while preserving and reusing past knowledge. We propose a novel algorithm for unsupervised representation learning from piece-wise stationary visual data: Variational Autoencoder with Shared Embeddings (VASE). Based on the Minimum Description Length principle, VASE automatically detects shifts in the data distribution and allocates spare representational capacity to new knowledge, while simultaneously protecting previously learnt representations from catastrophic forgetting. Our approach encourages the learnt representations to be disentangled, which imparts a number of desirable properties: VASE can deal sensibly with ambiguous inputs, it can enhance its own representations through imagination-based exploration, and most importantly, it exhibits semantically meaningful sharing of latents between different datasets. Compared to baselines with entangled representations, our approach is able to reason beyond surface-level statistics and perform semantically meaningful cross-domain inference

arXiv.org e-Print Archive

TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation

Author: Cao Kaidi
Li Cheng
Loy Chen Change
Qian Chen
Wu Wayne
Publication venue
Publication date: 21/04/2019
Field of study

Unsupervised image-to-image translation aims at learning a mapping between two visual domains. However, learning a translation across large geometry variations always ends up with failure. In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task. Instead of learning the mapping on the image space directly, we disentangle image space into a Cartesian product of the appearance and the geometry latent spaces. Specifically, we first introduce a geometry prior loss and a conditional VAE loss to encourage the network to learn independent but complementary representations. The translation is then built on appearance and geometry space separately. Extensive experiments demonstrate the superior performance of our method to other state-of-the-art approaches, especially in the challenging near-rigid and non-rigid objects translation tasks. In addition, by taking different exemplars as the appearance references, our method also supports multimodal translation. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page: https://wywu.github.io/projects/TGaGa/TGaGa.htm

arXiv.org e-Print Archive