18,965 research outputs found
Some Theoretical Properties of GANs
Generative Adversarial Networks (GANs) are a class of generative algorithms
that have been shown to produce state-of-the art samples, especially in the
domain of image creation. The fundamental principle of GANs is to approximate
the unknown distribution of a given data set by optimizing an objective
function through an adversarial game between a family of generators and a
family of discriminators. In this paper, we offer a better theoretical
understanding of GANs by analyzing some of their mathematical and statistical
properties. We study the deep connection between the adversarial principle
underlying GANs and the Jensen-Shannon divergence, together with some
optimality characteristics of the problem. An analysis of the role of the
discriminator family via approximation arguments is also provided. In addition,
taking a statistical point of view, we study the large sample properties of the
estimated distribution and prove in particular a central limit theorem. Some of
our results are illustrated with simulated examples
Some Theoretical Insights into Wasserstein GANs
Generative Adversarial Networks (GANs) have been successful in producing
outstanding results in areas as diverse as image, video, and text generation.
Building on these successes, a large number of empirical studies have validated
the benefits of the cousin approach called Wasserstein GANs (WGANs), which
brings stabilization in the training process. In the present paper, we add a
new stone to the edifice by proposing some theoretical advances in the
properties of WGANs. First, we properly define the architecture of WGANs in the
context of integral probability metrics parameterized by neural networks and
highlight some of their basic mathematical features. We stress in particular
interesting optimization properties arising from the use of a parametric
1-Lipschitz discriminator. Then, in a statistically-driven approach, we study
the convergence of empirical WGANs as the sample size tends to infinity, and
clarify the adversarial effects of the generator and the discriminator by
underlining some trade-off properties. These features are finally illustrated
with experiments using both synthetic and real-world datasets
GANs with Conditional Independence Graphs: On Subadditivity of Probability Divergences
Generative Adversarial Networks (GANs) are modern methods to learn the
underlying distribution of a data set. GANs have been widely used in sample
synthesis, de-noising, domain transfer, etc. GANs, however, are designed in a
model-free fashion where no additional information about the underlying
distribution is available. In many applications, however, practitioners have
access to the underlying independence graph of the variables, either as a
Bayesian network or a Markov Random Field (MRF). We ask: how can one use this
additional information in designing model-based GANs? In this paper, we provide
theoretical foundations to answer this question by studying subadditivity
properties of probability divergences, which establish upper bounds on the
distance between two high-dimensional distributions by the sum of distances
between their marginals over (local) neighborhoods of the graphical structure
of the Bayes-net or the MRF. We prove that several popular probability
divergences satisfy some notion of subadditivity under mild conditions. These
results lead to a principled design of a model-based GAN that uses a set of
simple discriminators on the neighborhoods of the Bayes-net/MRF, rather than a
giant discriminator on the entire network, providing significant statistical
and computational benefits. Our experiments on synthetic and real-world
datasets demonstrate the benefits of our principled design of model-based GANs
Generative Adversarial Networks (GANs): What it can generate and What it cannot?
In recent years, Generative Adversarial Networks (GANs) have received
significant attention from the research community. With a straightforward
implementation and outstanding results, GANs have been used for numerous
applications. Despite the success, GANs lack a proper theoretical explanation.
These models suffer from issues like mode collapse, non-convergence, and
instability during training. To address these issues, researchers have proposed
theoretically rigorous frameworks inspired by varied fields of Game theory,
Statistical theory, Dynamical systems, etc.
In this paper, we propose to give an appropriate structure to study these
contributions systematically. We essentially categorize the papers based on the
issues they raise and the kind of novelty they introduce to address them.
Besides, we provide insight into how each of the discussed articles solves the
concerned problems. We compare and contrast different results and put forth a
summary of theoretical contributions about GANs with focus on image/visual
applications. We expect this summary paper to give a bird's eye view to a
person wishing to understand the theoretical progress in GANs so far
Relaxed Wasserstein with Applications to GANs
Wasserstein Generative Adversarial Networks (WGANs) provide a versatile class
of models, which have attracted great attention in various applications.
However, this framework has two main drawbacks: (i) Wasserstein-1 (or
Earth-Mover) distance is restrictive such that WGANs cannot always fit data
geometry well; (ii) It is difficult to achieve fast training of WGANs. In this
paper, we propose a new class of \textit{Relaxed Wasserstein} (RW) distances by
generalizing Wasserstein-1 distance with Bregman cost functions. We show that
RW distances achieve nice statistical properties while not sacrificing the
computational tractability. Combined with the GANs framework, we develop
Relaxed WGANs (RWGANs) which are not only statistically flexible but can be
approximated efficiently using heuristic approaches. Experiments on real images
demonstrate that the RWGAN with Kullback-Leibler (KL) cost function outperforms
other competing approaches, e.g., WGANs, even with gradient penalty.Comment: Accepted by ICASSP 2021; add the reference
Non-parametric estimation of Jensen-Shannon Divergence in Generative Adversarial Network training
Generative Adversarial Networks (GANs) have become a widely popular framework
for generative modelling of high-dimensional datasets. However their training
is well-known to be difficult. This work presents a rigorous statistical
analysis of GANs providing straight-forward explanations for common training
pathologies such as vanishing gradients. Furthermore, it proposes a new
training objective, Kernel GANs, and demonstrates its practical effectiveness
on large-scale real-world data sets. A key element in the analysis is the
distinction between training with respect to the (unknown) data distribution,
and its empirical counterpart. To overcome issues in GAN training, we pursue
the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating
noise in the input distributions of the discriminator. As we show, this
effectively leads to an empirical version of the JSD in which the true and the
generator densities are replaced by kernel density estimates, which leads to
Kernel GANs
The Numerics of GANs
In this paper, we analyze the numerics of common algorithms for training
Generative Adversarial Networks (GANs). Using the formalism of smooth
two-player games we analyze the associated gradient vector field of GAN
training objectives. Our findings suggest that the convergence of current
algorithms suffers due to two factors: i) presence of eigenvalues of the
Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues
with big imaginary part. Using these findings, we design a new algorithm that
overcomes some of these limitations and has better convergence properties.
Experimentally, we demonstrate its superiority on training common GAN
architectures and show convergence on GAN architectures that are known to be
notoriously hard to train
The Inductive Bias of Restricted f-GANs
Generative adversarial networks are a novel method for statistical inference
that have achieved much empirical success; however, the factors contributing to
this success remain ill-understood. In this work, we attempt to analyze
generative adversarial learning -- that is, statistical inference as the result
of a game between a generator and a discriminator -- with the view of
understanding how it differs from classical statistical inference solutions
such as maximum likelihood inference and the method of moments.
Specifically, we provide a theoretical characterization of the distribution
inferred by a simple form of generative adversarial learning called restricted
f-GANs -- where the discriminator is a function in a given function class, the
distribution induced by the generator is restricted to lie in a pre-specified
distribution class and the objective is similar to a variational form of the
f-divergence. A consequence of our result is that for linear KL-GANs -- that
is, when the discriminator is a linear function over some feature space and f
corresponds to the KL-divergence -- the distribution induced by the optimal
generator is neither the maximum likelihood nor the method of moments solution,
but an interesting combination of both
PacGAN: The power of two samples in generative adversarial networks
Generative adversarial networks (GANs) are innovative techniques for learning
generative models of complex data distributions from samples. Despite
remarkable recent improvements in generating realistic images, one of their
major shortcomings is the fact that in practice, they tend to produce samples
with little diversity, even when trained on diverse datasets. This phenomenon,
known as mode collapse, has been the main focus of several recent advances in
GANs. Yet there is little understanding of why mode collapse happens and why
existing approaches are able to mitigate mode collapse. We propose a principled
approach to handling mode collapse, which we call packing. The main idea is to
modify the discriminator to make decisions based on multiple samples from the
same class, either real or artificially generated. We borrow analysis tools
from binary hypothesis testing---in particular the seminal result of Blackwell
[Bla53]---to prove a fundamental connection between packing and mode collapse.
We show that packing naturally penalizes generators with mode collapse, thereby
favoring generator distributions with less mode collapse during the training
process. Numerical experiments on benchmark datasets suggests that packing
provides significant improvements in practice as well.Comment: 49 pages, 24 figure
Implicit Manifold Learning on Generative Adversarial Networks
This paper raises an implicit manifold learning perspective in Generative
Adversarial Networks (GANs), by studying how the support of the learned
distribution, modelled as a submanifold , perfectly match
with , the support of the real data distribution. We show that
optimizing Jensen-Shannon divergence forces to perfectly
match with , while optimizing Wasserstein distance does not.
On the other hand, by comparing the gradients of the Jensen-Shannon divergence
and the Wasserstein distances ( and ) in their primal forms, we
conjecture that Wasserstein may enjoy desirable properties such as
reduced mode collapse. It is therefore interesting to design new distances that
inherit the best from both distances
- …