904 research outputs found
Stacked Wasserstein Autoencoder
Approximating distributions over complicated manifolds, such as natural
images, are conceptually attractive. The deep latent variable model, trained
using variational autoencoders and generative adversarial networks, is now a
key technique for representation learning. However, it is difficult to unify
these two models for exact latent-variable inference and parallelize both
reconstruction and sampling, partly due to the regularization under the latent
variables, to match a simple explicit prior distribution. These approaches are
prone to be oversimplified, and can only characterize a few modes of the true
distribution. Based on the recently proposed Wasserstein autoencoder (WAE) with
a new regularization as an optimal transport. The paper proposes a stacked
Wasserstein autoencoder (SWAE) to learn a deep latent variable model. SWAE is a
hierarchical model, which relaxes the optimal transport constraints at two
stages. At the first stage, the SWAE flexibly learns a representation
distribution, i.e., the encoded prior; and at the second stage, the encoded
representation distribution is approximated with a latent variable model under
the regularization encouraging the latent distribution to match the explicit
prior. This model allows us to generate natural textual outputs as well as
perform manipulations in the latent space to induce changes in the output
space. Both quantitative and qualitative results demonstrate the superior
performance of SWAE compared with the state-of-the-art approaches in terms of
faithful reconstruction and generation quality.Comment: arXiv admin note: text overlap with arXiv:1902.0558
Transport Analysis of Infinitely Deep Neural Network
We investigated the feature map inside deep neural networks (DNNs) by
tracking the transport map. We are interested in the role of depth (why do DNNs
perform better than shallow models?) and the interpretation of DNNs (what do
intermediate layers do?) Despite the rapid development in their application,
DNNs remain analytically unexplained because the hidden layers are nested and
the parameters are not faithful. Inspired by the integral representation of
shallow NNs, which is the continuum limit of the width, or the hidden unit
number, we developed the flow representation and transport analysis of DNNs.
The flow representation is the continuum limit of the depth or the hidden layer
number, and it is specified by an ordinary differential equation with a vector
field. We interpret an ordinary DNN as a transport map or a Euler broken line
approximation of the flow. Technically speaking, a dynamical system is a
natural model for the nested feature maps. In addition, it opens a new way to
the coordinate-free treatment of DNNs by avoiding the redundant parametrization
of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects:
dynamical system, continuity equation, and Wasserstein gradient flow. A key
finding is that we specified a series of transport maps of the denoising
autoencoder (DAE). Starting from the shallow DAE, this paper develops three
topics: the transport map of the deep DAE, the equivalence between the stacked
DAE and the composition of DAEs, and the development of the double continuum
limit or the integral representation of the flow representation. As partial
answers to the research questions, we found that deeper DAEs converge faster
and the extracted features are better; in addition, a deep Gaussian DAE
transports mass to decrease the Shannon entropy of the data distribution
Geometric Understanding of Deep Learning
Deep learning is the mainstream technique for many machine learning tasks,
including image recognition, machine translation, speech recognition, and so
on. It has outperformed conventional methods in various fields and achieved
great successes. Unfortunately, the understanding on how it works remains
unclear. It has the central importance to lay down the theoretic foundation for
deep learning.
In this work, we give a geometric view to understand deep learning: we show
that the fundamental principle attributing to the success is the manifold
structure in data, namely natural high dimensional data concentrates close to a
low-dimensional manifold, deep learning learns the manifold and the probability
distribution on it.
We further introduce the concepts of rectified linear complexity for deep
neural network measuring its learning capability, rectified linear complexity
of an embedding manifold describing the difficulty to be learned. Then we show
for any deep neural network with fixed architecture, there exists a manifold
that cannot be learned by the network. Finally, we propose to apply optimal
mass transportation theory to control the probability distribution in the
latent space
Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks
The feature map obtained from the denoising autoencoder (DAE) is investigated
by determining transportation dynamics of the DAE, which is a cornerstone for
deep learning. Despite the rapid development in its application, deep neural
networks remain analytically unexplained, because the feature maps are nested
and parameters are not faithful. In this paper, we address the problem of the
formulation of nested complex of parameters by regarding the feature map as a
transport map. Even when a feature map has different dimensions between input
and output, we can regard it as a transportation map by considering that both
the input and output spaces are embedded in a common high-dimensional space. In
addition, the trajectory is a geometric object and thus, is independent of
parameterization. In this manner, transportation can be regarded as a universal
character of deep neural networks. By determining and analyzing the
transportation dynamics, we can understand the behavior of a deep neural
network. In this paper, we investigate a fundamental case of deep neural
networks: the DAE. We derive the transport map of the DAE, and reveal that the
infinitely deep DAE transports mass to decrease a certain quantity, such as
entropy, of the data distribution. These results though analytically simple,
shed light on the correspondence between deep neural networks and the
Wasserstein gradient flows.Comment: Accepted at NIPS 2017 workshop on Optimal Transport & Machine
Learning (OTML2017
Adversarially Approximated Autoencoder for Image Generation and Manipulation
Regularized autoencoders learn the latent codes, a structure with the
regularization under the distribution, which enables them the capability to
infer the latent codes given observations and generate new samples given the
codes. However, they are sometimes ambiguous as they tend to produce
reconstructions that are not necessarily faithful reproduction of the inputs.
The main reason is to enforce the learned latent code distribution to match a
prior distribution while the true distribution remains unknown. To improve the
reconstruction quality and learn the latent space a manifold structure, this
work present a novel approach using the adversarially approximated autoencoder
(AAAE) to investigate the latent codes with adversarial approximation. Instead
of regularizing the latent codes by penalizing on the distance between the
distributions of the model and the target, AAAE learns the autoencoder flexibly
and approximates the latent space with a simpler generator. The ratio is
estimated using generative adversarial network (GAN) to enforce the similarity
of the distributions. Additionally, the image space is regularized with an
additional adversarial regularizer. The proposed approach unifies two deep
generative models for both latent space inference and diverse generation. The
learning scheme is realized without regularization on the latent codes, which
also encourages faithful reconstruction. Extensive validation experiments on
four real-world datasets demonstrate the superior performance of AAAE. In
comparison to the state-of-the-art approaches, AAAE generates samples with
better quality and shares the properties of regularized autoencoder with a nice
latent manifold structure
Fixing Bias in Reconstruction-based Anomaly Detection with Lipschitz Discriminators
Anomaly detection is of great interest in fields where abnormalities need to
be identified and corrected (e.g., medicine and finance). Deep learning methods
for this task often rely on autoencoder reconstruction error, sometimes in
conjunction with other errors. We show that this approach exhibits intrinsic
biases that lead to undesirable results. Reconstruction-based methods are
sensitive to training-data outliers and simple-to-reconstruct points. Instead,
we introduce a new unsupervised Lipschitz anomaly discriminator that does not
suffer from these biases. Our anomaly discriminator is trained, similar to the
ones used in GANs, to detect the difference between the training data and
corruptions of the training data. We show that this procedure successfully
detects unseen anomalies with guarantees on those that have a certain
Wasserstein distance from the data or corrupted training set. These additions
allow us to show improved performance on MNIST, CIFAR10, and health record
data.Comment: 6 pages, 4 figures, 2 tables, presented at IEEE MLS
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
ClusterGAN : Latent Space Clustering in Generative Adversarial Networks
Generative Adversarial networks (GANs) have obtained remarkable success in
many unsupervised learning tasks and unarguably, clustering is an important
unsupervised learning problem. While one can potentially exploit the
latent-space back-projection in GANs to cluster, we demonstrate that the
cluster structure is not retained in the GAN latent space.
In this paper, we propose ClusterGAN as a new mechanism for clustering using
GANs. By sampling latent variables from a mixture of one-hot encoded variables
and continuous latent variables, coupled with an inverse network (which
projects the data to the latent space) trained jointly with a clustering
specific loss, we are able to achieve clustering in the latent space. Our
results show a remarkable phenomenon that GANs can preserve latent space
interpolation across categories, even though the discriminator is never exposed
to such vectors. We compare our results with various clustering baselines and
demonstrate superior performance on both synthetic and real datasets.Comment: GANs, Clustering, Latent Space, Interpolation (v2 : Typos fixed, some
new experiments added, reported metrics on best validated model.
NTIRE 2020 Challenge on Image and Video Deblurring
Motion blur is one of the most common degradation artifacts in dynamic scene
photography. This paper reviews the NTIRE 2020 Challenge on Image and Video
Deblurring. In this challenge, we present the evaluation results from 3
competition tracks as well as the proposed solutions. Track 1 aims to develop
single-image deblurring methods focusing on restoration quality. On Track 2,
the image deblurring methods are executed on a mobile platform to find the
balance of the running speed and the restoration accuracy. Track 3 targets
developing video deblurring methods that exploit the temporal relation between
input frames. In each competition, there were 163, 135, and 102 registered
participants and in the final testing phase, 9, 4, and 7 teams competed. The
winning methods demonstrate the state-ofthe-art performance on image and video
deblurring tasks.Comment: To be published in CVPR 2020 Workshop (New Trends in Image
Restoration and Enhancement
Symmetric Variational Autoencoder and Connections to Adversarial Learning
A new form of the variational autoencoder (VAE) is proposed, based on the
symmetric Kullback-Leibler divergence. It is demonstrated that learning of the
resulting symmetric VAE (sVAE) has close connections to previously developed
adversarial-learning methods. This relationship helps unify the previously
distinct techniques of VAE and adversarially learning, and provides insights
that allow us to ameliorate shortcomings with some previously developed
adversarial methods. In addition to an analysis that motivates and explains the
sVAE, an extensive set of experiments validate the utility of the approach
- …