16 research outputs found
Tighter Variational Representations of f-Divergences via Restriction to Probability Measures
We show that the variational representations for f-divergences currently used
in the literature can be tightened. This has implications to a number of
methods recently proposed based on this representation. As an example
application we use our tighter representation to derive a general f-divergence
estimator based on two i.i.d. samples and derive the dual program for this
estimator that performs well empirically. We also point out a connection
between our estimator and MMD.Comment: ICML201
Robust bounds on risk-sensitive functionals via Renyi divergence
We extend the duality between exponential integrals and relative entropy to a
variational formula for exponential integrals involving the Renyi divergence.
This formula characterizes the dependence of risk-sensitive functionals and
related quantities determined by tail behavior to perturbations in the
underlying distributions, in terms of the Renyi divergence. The
characterization gives rise to upper and lower bounds that are meaningful for
all values of a large deviation scaling parameter, allowing one to quantify in
explicit terms the robustness of risk-sensitive costs. As applications we
consider problems of uncertainty quantification when aspects of the model are
not fully known, as well their use in bounding tail properties of an
intractable model in terms of a tractable one.Comment: 20 pages, 2 figure
The Inductive Bias of Restricted f-GANs
Generative adversarial networks are a novel method for statistical inference
that have achieved much empirical success; however, the factors contributing to
this success remain ill-understood. In this work, we attempt to analyze
generative adversarial learning -- that is, statistical inference as the result
of a game between a generator and a discriminator -- with the view of
understanding how it differs from classical statistical inference solutions
such as maximum likelihood inference and the method of moments.
Specifically, we provide a theoretical characterization of the distribution
inferred by a simple form of generative adversarial learning called restricted
f-GANs -- where the discriminator is a function in a given function class, the
distribution induced by the generator is restricted to lie in a pre-specified
distribution class and the objective is similar to a variational form of the
f-divergence. A consequence of our result is that for linear KL-GANs -- that
is, when the discriminator is a linear function over some feature space and f
corresponds to the KL-divergence -- the distribution induced by the optimal
generator is neither the maximum likelihood nor the method of moments solution,
but an interesting combination of both
Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning
Talking face generation aims to synthesize a face video with precise lip
synchronization as well as a smooth transition of facial motion over the entire
video via the given speech clip and facial image. Most existing methods mainly
focus on either disentangling the information in a single image or learning
temporal information between frames. However, cross-modality coherence between
audio and video information has not been well addressed during synthesis. In
this paper, we propose a novel arbitrary talking face generation framework by
discovering the audio-visual coherence via the proposed Asymmetric Mutual
Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA)
block by selectively focusing the lip area of the input image during the
training stage, to further enhance lip synchronization. Experimental results on
benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods
on prevalent metrics with robust high-resolution synthesizing on gender and
pose variations.Comment: IJCAI-202
Deep Learning for Channel Coding via Neural Mutual Information Estimation
End-to-end deep learning for communication systems, i.e., systems whose
encoder and decoder are learned, has attracted significant interest recently,
due to its performance which comes close to well-developed classical
encoder-decoder designs. However, one of the drawbacks of current learning
approaches is that a differentiable channel model is needed for the training of
the underlying neural networks. In real-world scenarios, such a channel model
is hardly available and often the channel density is not even known at all.
Some works, therefore, focus on a generative approach, i.e., generating the
channel from samples, or rely on reinforcement learning to circumvent this
problem. We present a novel approach which utilizes a recently proposed neural
estimator of mutual information. We use this estimator to optimize the encoder
for a maximized mutual information, only relying on channel samples. Moreover,
we show that our approach achieves the same performance as state-of-the-art
end-to-end learning with perfect channel model knowledge.Comment: 5 pages, 6 figure
Variational Representations and Neural Network Estimation of R\'enyi Divergences
We derive a new variational formula for the R\'enyi family of divergences,
, between probability measures and . Our result
generalizes the classical Donsker-Varadhan variational formula for the
Kullback-Leibler divergence. We further show that this R\'enyi variational
formula holds over a range of function spaces; this leads to a formula for the
optimizer under very weak assumptions and is also key in our development of a
consistency theory for R\'enyi divergence estimators. By applying this theory
to neural-network estimators, we show that if a neural network family satisfies
one of several strengthened versions of the universal approximation property
then the corresponding R\'enyi divergence estimator is consistent. In contrast
to density-estimator based methods, our estimators involve only expectations
under and and hence are more effective in high dimensional systems. We
illustrate this via several numerical examples of neural network estimation in
systems of up to 5000 dimensions.Comment: 24 pages, 2 figure
Parametric Adversarial Divergences are Good Task Losses for Generative Modeling
Generative modeling of high dimensional data like images is a notoriously
difficult and ill-defined problem. In particular, how to evaluate a learned
generative model is unclear. In this position paper, we argue that adversarial
learning, pioneered with generative adversarial networks (GANs), provides an
interesting framework to implicitly define more meaningful task losses for
generative modeling tasks, such as for generating "visually realistic" images.
We refer to those task losses as parametric adversarial divergences and we give
two main reasons why we think parametric divergences are good learning
objectives for generative modeling. Additionally, we unify the processes of
choosing a good structured loss (in structured prediction) and choosing a
discriminator architecture (in generative modeling) using statistical decision
theory; we are then able to formalize and quantify the intuition that "weaker"
losses are easier to learn from, in a specific setting. Finally, we propose two
new challenging tasks to evaluate parametric and nonparametric divergences: a
qualitative task of generating very high-resolution digits, and a quantitative
task of learning data that satisfies high-level algebraic constraints. We use
two common divergences to train a generator and show that the parametric
divergence outperforms the nonparametric divergence on both the qualitative and
the quantitative task.Comment: 22 page
Bridging the Gap Between -GANs and Wasserstein GANs
Generative adversarial networks (GANs) have enjoyed much success in learning
high-dimensional distributions. Learning objectives approximately minimize an
-divergence (-GANs) or an integral probability metric (Wasserstein GANs)
between the model and the data distribution using a discriminator. Wasserstein
GANs enjoy superior empirical performance, but in -GANs the discriminator
can be interpreted as a density ratio estimator which is necessary in some GAN
applications. In this paper, we bridge the gap between -GANs and Wasserstein
GANs (WGANs). First, we list two constraints over variational -divergence
estimation objectives that preserves the optimal solution. Next, we minimize
over a Lagrangian relaxation of the constrained objective, and show that it
generalizes critic objectives of both -GAN and WGAN. Based on this
generalization, we propose a novel practical objective, named KL-Wasserstein
GAN (KL-WGAN). We demonstrate empirical success of KL-WGAN on synthetic
datasets and real-world image generation benchmarks, and achieve
state-of-the-art FID scores on CIFAR10 image generation.Comment: updated for ICML camera ready versio
Regularized Policies are Reward Robust
Entropic regularization of policies in Reinforcement Learning (RL) is a
commonly used heuristic to ensure that the learned policy explores the
state-space sufficiently before overfitting to a local optimal policy. The
primary motivation for using entropy is for exploration and disambiguating
optimal policies; however, the theoretical effects are not entirely understood.
In this work, we study the more general regularized RL objective and using
Fenchel duality; we derive the dual problem which takes the form of an
adversarial reward problem. In particular, we find that the optimal policy
found by a regularized objective is precisely an optimal policy of a
reinforcement learning problem under a worst-case adversarial reward. Our
result allows us to reinterpret the popular entropic regularization scheme as a
form of robustification. Furthermore, due to the generality of our results, we
apply to other existing regularization schemes. Our results thus give insights
into the effects of regularization of policies and deepen our understanding of
exploration through robust rewards at large
On Mutual Information Maximization for Representation Learning
Many recent methods for unsupervised or self-supervised representation
learning train feature extractors by maximizing an estimate of the mutual
information (MI) between different views of the data. This comes with several
immediate problems: For example, MI is notoriously hard to estimate, and using
it as an objective for representation learning may lead to highly entangled
representations due to its invariance under arbitrary invertible
transformations. Nevertheless, these methods have been repeatedly shown to
excel in practice. In this paper we argue, and provide empirical evidence, that
the success of these methods cannot be attributed to the properties of MI
alone, and that they strongly depend on the inductive bias in both the choice
of feature extractor architectures and the parametrization of the employed MI
estimators. Finally, we establish a connection to deep metric learning and
argue that this interpretation may be a plausible explanation for the success
of the recently introduced methods.Comment: ICLR 2020. Michael Tschannen and Josip Djolonga contributed equall