112 research outputs found
On GANs and GMMs
A longstanding problem in machine learning is to find unsupervised methods
that can learn the statistical structure of high dimensional signals. In recent
years, GANs have gained much attention as a possible solution to the problem,
and in particular have shown the ability to generate remarkably realistic high
resolution sampled images. At the same time, many authors have pointed out that
GANs may fail to model the full distribution ("mode collapse") and that using
the learned models for anything other than generating samples may be very
difficult. In this paper, we examine the utility of GANs in learning
statistical models of images by comparing them to perhaps the simplest
statistical model, the Gaussian Mixture Model. First, we present a simple
method to evaluate generative models based on relative proportions of samples
that fall into predetermined bins. Unlike previous automatic methods for
evaluating models, our method does not rely on an additional neural network nor
does it require approximating intractable computations. Second, we compare the
performance of GANs to GMMs trained on the same datasets. While GMMs have
previously been shown to be successful in modeling small patches of images, we
show how to train them on full sized images despite the high dimensionality.
Our results show that GMMs can generate realistic samples (although less sharp
than those of GANs) but also capture the full distribution, which GANs fail to
do. Furthermore, GMMs allow efficient inference and explicit representation of
the underlying statistical structure. Finally, we discuss how GMMs can be used
to generate sharp images.Comment: Accepted to NIPS 201
Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step
Generative adversarial networks (GANs) are a family of generative models that
do not minimize a single training criterion. Unlike other generative models,
the data distribution is learned via a game between a generator (the generative
model) and a discriminator (a teacher providing training signal) that each
minimize their own cost. GANs are designed to reach a Nash equilibrium at which
each player cannot reduce their cost without changing the other players'
parameters. One useful approach for the theory of GANs is to show that a
divergence between the training distribution and the model distribution obtains
its minimum value at equilibrium. Several recent research directions have been
motivated by the idea that this divergence is the primary guide for the
learning process and that every step of learning should decrease the
divergence. We show that this view is overly restrictive. During GAN training,
the discriminator provides learning signal in situations where the gradients of
the divergences between distributions would not be useful. We provide empirical
counterexamples to the view of GAN training as divergence minimization.
Specifically, we demonstrate that GANs are able to learn distributions in
situations where the divergence minimization point of view predicts they would
fail. We also show that gradient penalties motivated from the divergence
minimization perspective are equally helpful when applied in other contexts in
which the divergence minimization perspective does not predict they would be
helpful. This contributes to a growing body of evidence that GAN training may
be more usefully viewed as approaching Nash equilibria via trajectories that do
not necessarily minimize a specific divergence at each step.Comment: 18 page
The Cramer Distance as a Solution to Biased Wasserstein Gradients
The Wasserstein probability metric has received much attention from the
machine learning community. Unlike the Kullback-Leibler divergence, which
strictly measures change in probability, the Wasserstein metric reflects the
underlying geometry between outcomes. The value of being sensitive to this
geometry has been demonstrated, among others, in ordinal regression and
generative modelling. In this paper we describe three natural properties of
probability divergences that reflect requirements from machine learning: sum
invariance, scale sensitivity, and unbiased sample gradients. The Wasserstein
metric possesses the first two properties but, unlike the Kullback-Leibler
divergence, does not possess the third. We provide empirical evidence
suggesting that this is a serious issue in practice. Leveraging insights from
probabilistic forecasting we propose an alternative to the Wasserstein metric,
the Cram\'er distance. We show that the Cram\'er distance possesses all three
desired properties, combining the best of the Wasserstein and Kullback-Leibler
divergences. To illustrate the relevance of the Cram\'er distance in practice
we design a new algorithm, the Cram\'er Generative Adversarial Network (GAN),
and show that it performs significantly better than the related Wasserstein
GAN
Is Generator Conditioning Causally Related to GAN Performance?
Recent work (Pennington et al, 2017) suggests that controlling the entire
distribution of Jacobian singular values is an important design consideration
in deep learning. Motivated by this, we study the distribution of singular
values of the Jacobian of the generator in Generative Adversarial Networks
(GANs). We find that this Jacobian generally becomes ill-conditioned at the
beginning of training. Moreover, we find that the average (with z from p(z))
conditioning of the generator is highly predictive of two other ad-hoc metrics
for measuring the 'quality' of trained GANs: the Inception Score and the
Frechet Inception Distance (FID). We test the hypothesis that this relationship
is causal by proposing a 'regularization' technique (called Jacobian Clamping)
that softly penalizes the condition number of the generator Jacobian. Jacobian
Clamping improves the mean Inception Score and the mean FID for GANs trained on
several datasets. It also greatly reduces inter-run variance of the
aforementioned scores, addressing (at least partially) one of the main
criticisms of GANs
GILBO: One Metric to Measure Them All
We propose a simple, tractable lower bound on the mutual information
contained in the joint generative density of any latent variable generative
model: the GILBO (Generative Information Lower BOund). It offers a
data-independent measure of the complexity of the learned latent variable
description, giving the log of the effective description length. It is
well-defined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs
each trained on four datasets (MNIST, FashionMNIST, CIFAR-10 and CelebA) and
discuss the results.Comment: Accepted at NeurIPS 201
On the Discrimination-Generalization Tradeoff in GANs
Generative adversarial training can be generally understood as minimizing
certain moment matching loss defined by a set of discriminator functions,
typically neural networks. The discriminator set should be large enough to be
able to uniquely identify the true distribution (discriminative), and also be
small enough to go beyond memorizing samples (generalizable). In this paper, we
show that a discriminator set is guaranteed to be discriminative whenever its
linear span is dense in the set of bounded continuous functions. This is a very
mild condition satisfied even by neural networks with a single neuron. Further,
we develop generalization bounds between the learned distribution and true
distribution under different evaluation metrics. When evaluated with neural
distance, our bounds show that generalization is guaranteed as long as the
discriminator set is small enough, regardless of the size of the generator or
hypothesis set. When evaluated with KL divergence, our bound provides an
explanation on the counter-intuitive behaviors of testing likelihood in GAN
training. Our analysis sheds lights on understanding the practical performance
of GANs.Comment: ICLR 201
Restricting Greed in Training of Generative Adversarial Network
Generative adversarial network (GAN) has gotten wide re-search interest in
the field of deep learning. Variations of GAN have achieved competitive results
on specific tasks. However, the stability of training and diversity of
generated instances are still worth studying further. Training of GAN can be
thought of as a greedy procedure, in which the generative net tries to make the
locally optimal choice (minimizing loss function of discriminator) in each
iteration. Unfortunately, this often makes generated data resemble only a few
modes of real data and rotate between modes. To alleviate these problems, we
propose a novel training strategy to restrict greed in training of GAN. With
help of our method, the generated samples can cover more instance modes with
more stable training process. Evaluating our method on several representative
datasets, we demonstrate superiority of improved training strategy on typical
GAN models with different distance metrics
Training Generative Reversible Networks
Generative models with an encoding component such as autoencoders currently
receive great interest. However, training of autoencoders is typically
complicated by the need to train a separate encoder and decoder model that have
to be enforced to be reciprocal to each other. To overcome this problem,
by-design reversible neural networks (RevNets) had been previously used as
generative models either directly optimizing the likelihood of the data under
the model or using an adversarial approach on the generated data. Here, we
instead investigate their performance using an adversary on the latent space in
the adversarial autoencoder framework. We investigate the generative
performance of RevNets on the CelebA dataset, showing that generative RevNets
can generate coherent faces with similar quality as Variational Autoencoders.
This first attempt to use RevNets inside the adversarial autoencoder framework
slightly underperformed relative to recent advanced generative models using an
autoencoder component on CelebA, but this gap may diminish with further
optimization of the training setup of generative RevNets. In addition to the
experiments on CelebA, we show a proof-of-principle experiment on the MNIST
dataset suggesting that adversary-free trained RevNets can discover meaningful
latent dimensions without pre-specifying the number of dimensions of the latent
sampling distribution. In summary, this study shows that RevNets can be
employed in different generative training settings.
Source code for this study is at
https://github.com/robintibor/generative-reversibleComment: Source code for this study is at
https://github.com/robintibor/generative-reversibl
PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows
As 3D point clouds become the representation of choice for multiple vision
and graphics applications, the ability to synthesize or reconstruct
high-resolution, high-fidelity point clouds becomes crucial. Despite the recent
success of deep learning models in discriminative tasks of point clouds,
generating point clouds remains challenging. This paper proposes a principled
probabilistic framework to generate 3D point clouds by modeling them as a
distribution of distributions. Specifically, we learn a two-level hierarchy of
distributions where the first level is the distribution of shapes and the
second level is the distribution of points given a shape. This formulation
allows us to both sample shapes and sample an arbitrary number of points from a
shape. Our generative model, named PointFlow, learns each level of the
distribution with a continuous normalizing flow. The invertibility of
normalizing flows enables the computation of the likelihood during training and
allows us to train our model in the variational inference framework.
Empirically, we demonstrate that PointFlow achieves state-of-the-art
performance in point cloud generation. We additionally show that our model can
faithfully reconstruct point clouds and learn useful representations in an
unsupervised manner. The code will be available at
https://github.com/stevenygd/PointFlow.Comment: Published in ICCV 201
How Generative Adversarial Networks and Their Variants Work: An Overview
Generative Adversarial Networks (GAN) have received wide attention in the
machine learning field for their potential to learn high-dimensional, complex
real data distribution. Specifically, they do not rely on any assumptions about
the distribution and can generate real-like samples from latent space in a
simple manner. This powerful property leads GAN to be applied to various
applications such as image synthesis, image attribute editing, image
translation, domain adaptation and other academic fields. In this paper, we aim
to discuss the details of GAN for those readers who are familiar with, but do
not comprehend GAN deeply or who wish to view GAN from various perspectives. In
addition, we explain how GAN operates and the fundamental meaning of various
objective functions that have been suggested recently. We then focus on how the
GAN can be combined with an autoencoder framework. Finally, we enumerate the
GAN variants that are applied to various tasks and other fields for those who
are interested in exploiting GAN for their research.Comment: 41 pages, 16 figures, Published in ACM Computing Surveys (CSUR
- …