904 research outputs found

    Stacked Wasserstein Autoencoder

    Full text link
    Approximating distributions over complicated manifolds, such as natural images, are conceptually attractive. The deep latent variable model, trained using variational autoencoders and generative adversarial networks, is now a key technique for representation learning. However, it is difficult to unify these two models for exact latent-variable inference and parallelize both reconstruction and sampling, partly due to the regularization under the latent variables, to match a simple explicit prior distribution. These approaches are prone to be oversimplified, and can only characterize a few modes of the true distribution. Based on the recently proposed Wasserstein autoencoder (WAE) with a new regularization as an optimal transport. The paper proposes a stacked Wasserstein autoencoder (SWAE) to learn a deep latent variable model. SWAE is a hierarchical model, which relaxes the optimal transport constraints at two stages. At the first stage, the SWAE flexibly learns a representation distribution, i.e., the encoded prior; and at the second stage, the encoded representation distribution is approximated with a latent variable model under the regularization encouraging the latent distribution to match the explicit prior. This model allows us to generate natural textual outputs as well as perform manipulations in the latent space to induce changes in the output space. Both quantitative and qualitative results demonstrate the superior performance of SWAE compared with the state-of-the-art approaches in terms of faithful reconstruction and generation quality.Comment: arXiv admin note: text overlap with arXiv:1902.0558

    Transport Analysis of Infinitely Deep Neural Network

    Full text link
    We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth (why do DNNs perform better than shallow models?) and the interpretation of DNNs (what do intermediate layers do?) Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth or the hidden layer number, and it is specified by an ordinary differential equation with a vector field. We interpret an ordinary DNN as a transport map or a Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE). Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution

    Geometric Understanding of Deep Learning

    Full text link
    Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space

    Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks

    Full text link
    The feature map obtained from the denoising autoencoder (DAE) is investigated by determining transportation dynamics of the DAE, which is a cornerstone for deep learning. Despite the rapid development in its application, deep neural networks remain analytically unexplained, because the feature maps are nested and parameters are not faithful. In this paper, we address the problem of the formulation of nested complex of parameters by regarding the feature map as a transport map. Even when a feature map has different dimensions between input and output, we can regard it as a transportation map by considering that both the input and output spaces are embedded in a common high-dimensional space. In addition, the trajectory is a geometric object and thus, is independent of parameterization. In this manner, transportation can be regarded as a universal character of deep neural networks. By determining and analyzing the transportation dynamics, we can understand the behavior of a deep neural network. In this paper, we investigate a fundamental case of deep neural networks: the DAE. We derive the transport map of the DAE, and reveal that the infinitely deep DAE transports mass to decrease a certain quantity, such as entropy, of the data distribution. These results though analytically simple, shed light on the correspondence between deep neural networks and the Wasserstein gradient flows.Comment: Accepted at NIPS 2017 workshop on Optimal Transport & Machine Learning (OTML2017

    Adversarially Approximated Autoencoder for Image Generation and Manipulation

    Full text link
    Regularized autoencoders learn the latent codes, a structure with the regularization under the distribution, which enables them the capability to infer the latent codes given observations and generate new samples given the codes. However, they are sometimes ambiguous as they tend to produce reconstructions that are not necessarily faithful reproduction of the inputs. The main reason is to enforce the learned latent code distribution to match a prior distribution while the true distribution remains unknown. To improve the reconstruction quality and learn the latent space a manifold structure, this work present a novel approach using the adversarially approximated autoencoder (AAAE) to investigate the latent codes with adversarial approximation. Instead of regularizing the latent codes by penalizing on the distance between the distributions of the model and the target, AAAE learns the autoencoder flexibly and approximates the latent space with a simpler generator. The ratio is estimated using generative adversarial network (GAN) to enforce the similarity of the distributions. Additionally, the image space is regularized with an additional adversarial regularizer. The proposed approach unifies two deep generative models for both latent space inference and diverse generation. The learning scheme is realized without regularization on the latent codes, which also encourages faithful reconstruction. Extensive validation experiments on four real-world datasets demonstrate the superior performance of AAAE. In comparison to the state-of-the-art approaches, AAAE generates samples with better quality and shares the properties of regularized autoencoder with a nice latent manifold structure

    Fixing Bias in Reconstruction-based Anomaly Detection with Lipschitz Discriminators

    Full text link
    Anomaly detection is of great interest in fields where abnormalities need to be identified and corrected (e.g., medicine and finance). Deep learning methods for this task often rely on autoencoder reconstruction error, sometimes in conjunction with other errors. We show that this approach exhibits intrinsic biases that lead to undesirable results. Reconstruction-based methods are sensitive to training-data outliers and simple-to-reconstruct points. Instead, we introduce a new unsupervised Lipschitz anomaly discriminator that does not suffer from these biases. Our anomaly discriminator is trained, similar to the ones used in GANs, to detect the difference between the training data and corruptions of the training data. We show that this procedure successfully detects unseen anomalies with guarantees on those that have a certain Wasserstein distance from the data or corrupted training set. These additions allow us to show improved performance on MNIST, CIFAR10, and health record data.Comment: 6 pages, 4 figures, 2 tables, presented at IEEE MLS

    How Generative Adversarial Networks and Their Variants Work: An Overview

    Full text link
    Generative Adversarial Networks (GAN) have received wide attention in the machine learning field for their potential to learn high-dimensional, complex real data distribution. Specifically, they do not rely on any assumptions about the distribution and can generate real-like samples from latent space in a simple manner. This powerful property leads GAN to be applied to various applications such as image synthesis, image attribute editing, image translation, domain adaptation and other academic fields. In this paper, we aim to discuss the details of GAN for those readers who are familiar with, but do not comprehend GAN deeply or who wish to view GAN from various perspectives. In addition, we explain how GAN operates and the fundamental meaning of various objective functions that have been suggested recently. We then focus on how the GAN can be combined with an autoencoder framework. Finally, we enumerate the GAN variants that are applied to various tasks and other fields for those who are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR

    ClusterGAN : Latent Space Clustering in Generative Adversarial Networks

    Full text link
    Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.Comment: GANs, Clustering, Latent Space, Interpolation (v2 : Typos fixed, some new experiments added, reported metrics on best validated model.

    NTIRE 2020 Challenge on Image and Video Deblurring

    Full text link
    Motion blur is one of the most common degradation artifacts in dynamic scene photography. This paper reviews the NTIRE 2020 Challenge on Image and Video Deblurring. In this challenge, we present the evaluation results from 3 competition tracks as well as the proposed solutions. Track 1 aims to develop single-image deblurring methods focusing on restoration quality. On Track 2, the image deblurring methods are executed on a mobile platform to find the balance of the running speed and the restoration accuracy. Track 3 targets developing video deblurring methods that exploit the temporal relation between input frames. In each competition, there were 163, 135, and 102 registered participants and in the final testing phase, 9, 4, and 7 teams competed. The winning methods demonstrate the state-ofthe-art performance on image and video deblurring tasks.Comment: To be published in CVPR 2020 Workshop (New Trends in Image Restoration and Enhancement

    Symmetric Variational Autoencoder and Connections to Adversarial Learning

    Full text link
    A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence. It is demonstrated that learning of the resulting symmetric VAE (sVAE) has close connections to previously developed adversarial-learning methods. This relationship helps unify the previously distinct techniques of VAE and adversarially learning, and provides insights that allow us to ameliorate shortcomings with some previously developed adversarial methods. In addition to an analysis that motivates and explains the sVAE, an extensive set of experiments validate the utility of the approach
    • …
    corecore