1,748 research outputs found

    Learning Generative Models across Incomparable Spaces

    Full text link
    Generative Adversarial Networks have shown remarkable success in learning a distribution that faithfully recovers a reference distribution in its entirety. However, in some cases, we may want to only learn some aspects (e.g., cluster or manifold structure), while modifying others (e.g., style, orientation or dimension). In this work, we propose an approach to learn generative models across such incomparable spaces, and demonstrate how to steer the learned distribution towards target properties. A key component of our model is the Gromov-Wasserstein distance, a notion of discrepancy that compares distributions relationally rather than absolutely. While this framework subsumes current generative models in identically reproducing distributions, its inherent flexibility allows application to tasks in manifold learning, relational learning and cross-domain learning.Comment: International Conference on Machine Learning (ICML

    Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

    Full text link
    This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating Pβˆ—NΟƒP\ast\mathcal{N}_\sigma, for NΟƒβ‰œN(0,Οƒ2Id)\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d), by P^nβˆ—NΟƒ\hat{P}_n\ast\mathcal{N}_\sigma, where P^n\hat{P}_n is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and Ο‡2\chi^2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W1\mathsf{W}_1) converges at rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}} in remarkable contrast to a typical nβˆ’1dn^{-\frac{1}{d}} rate for unsmoothed W1\mathsf{W}_1 (and dβ‰₯3d\ge 3). For the KL divergence, squared 2-Wasserstein distance (W22\mathsf{W}_2^2), and Ο‡2\chi^2-divergence, the convergence rate is eO(d)nβˆ’1e^{O(d)}n^{-1}, but only if PP achieves finite input-output Ο‡2\chi^2 mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to Ο‰(nβˆ’1)\omega(n^{-1}) for the KL divergence and W22\mathsf{W}_2^2, while the Ο‡2\chi^2-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) in the high-dimensional regime. The distribution PP is unknown but nn i.i.d samples from it are available. We first show that any good estimator of h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) must have sample complexity that is exponential in dd. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}}, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158
    • …
    corecore