1,152 research outputs found
Revisiting Stochastic Extragradient
We fix a fundamental issue in the stochastic extragradient method by
providing a new sampling strategy that is motivated by approximating implicit
updates. Since the existing stochastic extragradient algorithm, called
Mirror-Prox, of (Juditsky et al., 2011) diverges on a simple bilinear problem
when the domain is not bounded, we prove guarantees for solving variational
inequality that go beyond existing settings. Furthermore, we illustrate
numerically that the proposed variant converges faster than many other methods
on bilinear saddle-point problems. We also discuss how extragradient can be
applied to training Generative Adversarial Networks (GANs) and how it compares
to other methods. Our experiments on GANs demonstrate that the introduced
approach may make the training faster in terms of data passes, while its higher
iteration complexity makes the advantage smaller.Comment: Accepted to AISTATS 2020. 16 pages, 9 figures, 2 algorithm
Implicit Manifold Learning on Generative Adversarial Networks
This paper raises an implicit manifold learning perspective in Generative
Adversarial Networks (GANs), by studying how the support of the learned
distribution, modelled as a submanifold , perfectly match
with , the support of the real data distribution. We show that
optimizing Jensen-Shannon divergence forces to perfectly
match with , while optimizing Wasserstein distance does not.
On the other hand, by comparing the gradients of the Jensen-Shannon divergence
and the Wasserstein distances ( and ) in their primal forms, we
conjecture that Wasserstein may enjoy desirable properties such as
reduced mode collapse. It is therefore interesting to design new distances that
inherit the best from both distances
MINE: Mutual Information Neural Estimation
We argue that the estimation of mutual information between high dimensional
continuous random variables can be achieved by gradient descent over neural
networks. We present a Mutual Information Neural Estimator (MINE) that is
linearly scalable in dimensionality as well as in sample size, trainable
through back-prop, and strongly consistent. We present a handful of
applications on which MINE can be used to minimize or maximize mutual
information. We apply MINE to improve adversarially trained generative models.
We also use MINE to implement Information Bottleneck, applying it to supervised
classification; our results demonstrate substantial improvement in flexibility
and performance in these settings.Comment: 19 pages, 6 figure
Concept-Oriented Deep Learning: Generative Concept Representations
Generative concept representations have three major advantages over
discriminative ones: they can represent uncertainty, they support integration
of learning and reasoning, and they are good for unsupervised and
semi-supervised learning. We discuss probabilistic and generative deep
learning, which generative concept representations are based on, and the use of
variational autoencoders and generative adversarial networks for learning
generative concept representations, particularly for concepts whose data are
sequences, structured data or graphs
A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling
This document aims to provide a review on learning with deep generative
models (DGMs), which is an highly-active area in machine learning and more
generally, artificial intelligence. This review is not meant to be a tutorial,
but when necessary, we provide self-contained derivations for completeness.
This review has two features. First, though there are different perspectives to
classify DGMs, we choose to organize this review from the perspective of
graphical modeling, because the learning methods for directed DGMs and
undirected DGMs are fundamentally different. Second, we differentiate model
definitions from model learning algorithms, since different learning algorithms
can be applied to solve the learning problem on the same model, and an
algorithm can be applied to learn different models. We thus separate model
definition and model learning, with more emphasis on reviewing, differentiating
and connecting different learning algorithms. We also discuss promising future
research directions.Comment: add SN-GANs, SA-GANs, conditional generation (cGANs, AC-GANs). arXiv
admin note: text overlap with arXiv:1606.00709, arXiv:1801.03558 by other
author
Robust Estimation and Generative Adversarial Nets
Robust estimation under Huber's -contamination model has become an
important topic in statistics and theoretical computer science. Statistically
optimal procedures such as Tukey's median and other estimators based on depth
functions are impractical because of their computational intractability. In
this paper, we establish an intriguing connection between -GANs and various
depth functions through the lens of -Learning. Similar to the derivation of
-GANs, we show that these depth functions that lead to statistically optimal
robust estimators can all be viewed as variational lower bounds of the total
variation distance in the framework of -Learning. This connection opens the
door of computing robust estimators using tools developed for training GANs. In
particular, we show in both theory and experiments that some appropriate
structures of discriminator networks with hidden layers in GANs lead to
statistically optimal robust location estimators for both Gaussian distribution
and general elliptical distributions where first moment may not exist
Imitation Learning as -Divergence Minimization
We address the problem of imitation learning with multi-modal demonstrations.
Instead of attempting to learn all modes, we argue that in many tasks it is
sufficient to imitate any one of them. We show that the state-of-the-art
methods such as GAIL and behavior cloning, due to their choice of loss
function, often incorrectly interpolate between such modes. Our key insight is
to minimize the right divergence between the learner and the expert
state-action distributions, namely the reverse KL divergence or I-projection.
We propose a general imitation learning framework for estimating and minimizing
any f-Divergence. By plugging in different divergences, we are able to recover
existing algorithms such as Behavior Cloning (Kullback-Leibler), GAIL (Jensen
Shannon) and Dagger (Total Variation). Empirical results show that our
approximate I-projection technique is able to imitate multi-modal behaviors
more reliably than GAIL and behavior cloning.Comment: International Workshop on the Algorithmic Foundations of Robotics
(WAFR) 202
Finding Mixed Nash Equilibria of Generative Adversarial Networks
We reconsider the training objective of Generative Adversarial Networks
(GANs) from the mixed Nash Equilibria (NE) perspective. Inspired by the
classical prox methods, we develop a novel algorithmic framework for GANs via
an infinite-dimensional two-player game and prove rigorous convergence rates to
the mixed NE, resolving the longstanding problem that no provably convergent
algorithm exists for general GANs. We then propose a principled procedure to
reduce our novel prox methods to simple sampling routines, leading to
practically efficient algorithms. Finally, we provide experimental evidence
that our approach outperforms methods that seek pure strategy equilibria, such
as SGD, Adam, and RMSProp, both in speed and quality
Variational Approaches for Auto-Encoding Generative Adversarial Networks
Auto-encoding generative adversarial networks (GANs) combine the standard GAN
algorithm, which discriminates between real and model-generated data, with a
reconstruction loss given by an auto-encoder. Such models aim to prevent mode
collapse in the learned generative model by ensuring that it is grounded in all
the available training data. In this paper, we develop a principle upon which
auto-encoders can be combined with generative adversarial networks by
exploiting the hierarchical structure of the generative model. The underlying
principle shows that variational inference can be used a basic tool for
learning, but with the in- tractable likelihood replaced by a synthetic
likelihood, and the unknown posterior distribution replaced by an implicit
distribution; both synthetic likelihoods and implicit posterior distributions
can be learned using discriminators. This allows us to develop a natural fusion
of variational auto-encoders and generative adversarial networks, combining the
best of both these methods. We describe a unified objective for optimization,
discuss the constraints needed to guide learning, connect to the wide range of
existing work, and use a battery of tests to systematically and quantitatively
assess the performance of our method
MMD GAN: Towards Deeper Understanding of Moment Matching Network
Generative moment matching network (GMMN) is a deep generative model that
differs from Generative Adversarial Network (GAN) by replacing the
discriminator in GAN with a two-sample test based on kernel maximum mean
discrepancy (MMD). Although some theoretical guarantees of MMD have been
studied, the empirical performance of GMMN is still not as competitive as that
of GAN on challenging and large benchmark datasets. The computational
efficiency of GMMN is also less desirable in comparison with GAN, partially due
to its requirement for a rather large batch size during the training. In this
paper, we propose to improve both the model expressiveness of GMMN and its
computational efficiency by introducing adversarial kernel learning techniques,
as the replacement of a fixed Gaussian kernel in the original GMMN. The new
approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN.
The new distance measure in MMD GAN is a meaningful loss that enjoys the
advantage of weak topology and can be optimized via gradient descent with
relatively small batch sizes. In our evaluation on multiple benchmark datasets,
including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN
significantly outperforms GMMN, and is competitive with other representative
GAN works.Comment: In the Proceedings of Thirty-first Annual Conference on Neural
Information Processing Systems (NIPS 2017
- …