16 research outputs found

    Tighter Variational Representations of f-Divergences via Restriction to Probability Measures

    Full text link
    We show that the variational representations for f-divergences currently used in the literature can be tightened. This has implications to a number of methods recently proposed based on this representation. As an example application we use our tighter representation to derive a general f-divergence estimator based on two i.i.d. samples and derive the dual program for this estimator that performs well empirically. We also point out a connection between our estimator and MMD.Comment: ICML201

    Robust bounds on risk-sensitive functionals via Renyi divergence

    Full text link
    We extend the duality between exponential integrals and relative entropy to a variational formula for exponential integrals involving the Renyi divergence. This formula characterizes the dependence of risk-sensitive functionals and related quantities determined by tail behavior to perturbations in the underlying distributions, in terms of the Renyi divergence. The characterization gives rise to upper and lower bounds that are meaningful for all values of a large deviation scaling parameter, allowing one to quantify in explicit terms the robustness of risk-sensitive costs. As applications we consider problems of uncertainty quantification when aspects of the model are not fully known, as well their use in bounding tail properties of an intractable model in terms of a tractable one.Comment: 20 pages, 2 figure

    The Inductive Bias of Restricted f-GANs

    Full text link
    Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood. In this work, we attempt to analyze generative adversarial learning -- that is, statistical inference as the result of a game between a generator and a discriminator -- with the view of understanding how it differs from classical statistical inference solutions such as maximum likelihood inference and the method of moments. Specifically, we provide a theoretical characterization of the distribution inferred by a simple form of generative adversarial learning called restricted f-GANs -- where the discriminator is a function in a given function class, the distribution induced by the generator is restricted to lie in a pre-specified distribution class and the objective is similar to a variational form of the f-divergence. A consequence of our result is that for linear KL-GANs -- that is, when the discriminator is a linear function over some feature space and f corresponds to the KL-divergence -- the distribution induced by the optimal generator is neither the maximum likelihood nor the method of moments solution, but an interesting combination of both

    Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

    Full text link
    Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.Comment: IJCAI-202

    Deep Learning for Channel Coding via Neural Mutual Information Estimation

    Full text link
    End-to-end deep learning for communication systems, i.e., systems whose encoder and decoder are learned, has attracted significant interest recently, due to its performance which comes close to well-developed classical encoder-decoder designs. However, one of the drawbacks of current learning approaches is that a differentiable channel model is needed for the training of the underlying neural networks. In real-world scenarios, such a channel model is hardly available and often the channel density is not even known at all. Some works, therefore, focus on a generative approach, i.e., generating the channel from samples, or rely on reinforcement learning to circumvent this problem. We present a novel approach which utilizes a recently proposed neural estimator of mutual information. We use this estimator to optimize the encoder for a maximized mutual information, only relying on channel samples. Moreover, we show that our approach achieves the same performance as state-of-the-art end-to-end learning with perfect channel model knowledge.Comment: 5 pages, 6 figure

    Variational Representations and Neural Network Estimation of R\'enyi Divergences

    Full text link
    We derive a new variational formula for the R\'enyi family of divergences, Rα(Q∥P)R_\alpha(Q\|P), between probability measures QQ and PP. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this R\'enyi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions and is also key in our development of a consistency theory for R\'enyi divergence estimators. By applying this theory to neural-network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding R\'enyi divergence estimator is consistent. In contrast to density-estimator based methods, our estimators involve only expectations under QQ and PP and hence are more effective in high dimensional systems. We illustrate this via several numerical examples of neural network estimation in systems of up to 5000 dimensions.Comment: 24 pages, 2 figure

    Parametric Adversarial Divergences are Good Task Losses for Generative Modeling

    Full text link
    Generative modeling of high dimensional data like images is a notoriously difficult and ill-defined problem. In particular, how to evaluate a learned generative model is unclear. In this position paper, we argue that adversarial learning, pioneered with generative adversarial networks (GANs), provides an interesting framework to implicitly define more meaningful task losses for generative modeling tasks, such as for generating "visually realistic" images. We refer to those task losses as parametric adversarial divergences and we give two main reasons why we think parametric divergences are good learning objectives for generative modeling. Additionally, we unify the processes of choosing a good structured loss (in structured prediction) and choosing a discriminator architecture (in generative modeling) using statistical decision theory; we are then able to formalize and quantify the intuition that "weaker" losses are easier to learn from, in a specific setting. Finally, we propose two new challenging tasks to evaluate parametric and nonparametric divergences: a qualitative task of generating very high-resolution digits, and a quantitative task of learning data that satisfies high-level algebraic constraints. We use two common divergences to train a generator and show that the parametric divergence outperforms the nonparametric divergence on both the qualitative and the quantitative task.Comment: 22 page

    Bridging the Gap Between ff-GANs and Wasserstein GANs

    Full text link
    Generative adversarial networks (GANs) have enjoyed much success in learning high-dimensional distributions. Learning objectives approximately minimize an ff-divergence (ff-GANs) or an integral probability metric (Wasserstein GANs) between the model and the data distribution using a discriminator. Wasserstein GANs enjoy superior empirical performance, but in ff-GANs the discriminator can be interpreted as a density ratio estimator which is necessary in some GAN applications. In this paper, we bridge the gap between ff-GANs and Wasserstein GANs (WGANs). First, we list two constraints over variational ff-divergence estimation objectives that preserves the optimal solution. Next, we minimize over a Lagrangian relaxation of the constrained objective, and show that it generalizes critic objectives of both ff-GAN and WGAN. Based on this generalization, we propose a novel practical objective, named KL-Wasserstein GAN (KL-WGAN). We demonstrate empirical success of KL-WGAN on synthetic datasets and real-world image generation benchmarks, and achieve state-of-the-art FID scores on CIFAR10 image generation.Comment: updated for ICML camera ready versio

    Regularized Policies are Reward Robust

    Full text link
    Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we apply to other existing regularization schemes. Our results thus give insights into the effects of regularization of policies and deepen our understanding of exploration through robust rewards at large

    On Mutual Information Maximization for Representation Learning

    Full text link
    Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.Comment: ICLR 2020. Michael Tschannen and Josip Djolonga contributed equall
    corecore