Search CORE

70 research outputs found

On improving variational inference with low-variance multi-sample estimators

Author: Dhekane Eeshan Gunesh
Publication venue
Publication date: 01/08/2020
Field of study

Les progrès de l’inférence variationnelle, tels que l’approche de variational autoencoder (VI) (Kingma and Welling (2013), Rezende et al. (2014)) et ses nombreuses modifications, se sont avérés très efficaces pour l’apprentissage des représentations latentes de données. Importance-weighted variational inference (IWVI) par Burda et al. (2015) améliore l’inférence variationnelle en utilisant plusieurs échantillons indépendants et répartis de manière identique pour obtenir des limites inférieures variationnelles plus strictes. Des articles récents tels que l’approche de hierarchical importance-weighted autoencoders (HIWVI) par Huang et al. (2019) et la modélisation de la distribution conjointe par Klys et al. (2018) démontrent l’idée de modéliser une distribution conjointe sur des échantillons pour améliorer encore l’IWVI en le rendant efficace pour l’échantillon. L’idée sous-jacente de ce mémoire est de relier les propriétés statistiques des estimateurs au resserrement des limites variationnelles. Pour ce faire, nous démontrons d’abord une borne supérieure sur l’écart variationnel en termes de variance des estimateurs sous certaines conditions. Nous prouvons que l’écart variationnel peut être fait disparaître au taux de O(1/n) pour une grande famille d’approches d’inférence variationelle. Sur la base de ces résultats, nous proposons l’approche de Conditional-IWVI (CIWVI), qui modélise explicitement l’échantillonnage séquentiel et conditionnel de variables latentes pour effectuer importance-weighted variational inference, et une approche connexe de Antithetic-IWVI (AIWVI) par Klys et al. (2018). Nos expériences sur les jeux de données d’analyse comparative, tels que MNIST (LeCun et al. (2010)) et OMNIGLOT (Lake et al. (2015)), démontrent que nos approches fonctionnent soit de manière compétitive, soit meilleures que les références IWVI et HIWVI en tant que le nombre d’échantillons augmente. De plus, nous démontrons que les résultats sont conformes aux propriétés théoriques que nous avons prouvées. En conclusion, nos travaux fournissent une perspective sur le taux d’amélioration de l’inference variationelle avec le nombre d’échantillons utilisés et l’utilité de modéliser la distribution conjointe sur des représentations latentes pour l’efficacité de l’échantillon.Advances in variational inference, such as variational autoencoders (VI) (Kingma and Welling (2013), Rezende et al. (2014)) along with its numerous modifications, have proven highly successful for learning latent representations of data. Importance-weighted variational inference (IWVI) by Burda et al. (2015) improves the variational inference by using multiple i.i.d. samples for obtaining tighter variational lower bounds. Recent works like hierarchical importance-weighted autoencoders (HIWVI) by Huang et al. (2019) and joint distribution modeling by Klys et al. (2018) demonstrate the idea of modeling a joint distribution over samples to further improve over IWVI by making it sample efficient. The underlying idea in this thesis is to connect the statistical properties of the estimators to the tightness of the variational bounds. Towards this, we first demonstrate an upper bound on the variational gap in terms of the variance of the estimators under certain conditions. We prove that the variational gap can be made to vanish at the rate of O(1/n) for a large family of VI approaches. Based on these results, we propose the approach of Conditional-IWVI (CIWVI), which explicitly models the sequential and conditional sampling of latent variables to perform importance-weighted variational inference, and a related approach of Antithetic-IWVI (AIWVI) by Klys et al. (2018). Our experiments on the benchmarking datasets MNIST (LeCun et al. (2010)) and OMNIGLOT (Lake et al. (2015)) demonstrate that our approaches perform either competitively or better than the baselines IWVI and HIWVI as the number of samples increases. Further, we also demonstrate that the results are in accordance with the theoretical properties we proved. In conclusion, our work provides a perspective on the rate of improvement in VI with the number of samples used and the utility of modeling the joint distribution over latent representations for sample efficiency in VI

Dépôt Institutionnel Numérique

Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

Author: Knowles David A.
Salimans Tim
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/12/2013
Field of study

We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice

arXiv.org e-Print Archive

Crossref

Erasmus University Digital Repository

Storchastic: A Framework for General Stochastic Automatic Differentiation

Author: Teije Annette ten
Tomczak Jakub M.
van Krieken Emile
Publication venue
Publication date: 26/10/2021
Field of study

Modelers use automatic differentiation (AD) of computation graphs to implement complex Deep Learning models without defining gradient computations. Stochastic AD extends AD to stochastic computation graphs with sampling steps, which arise when modelers handle the intractable expectations common in Reinforcement Learning and Variational Inference. However, current methods for stochastic AD are limited: They are either only applicable to continuous random variables and differentiable functions, or can only use simple but high variance score-function estimators. To overcome these limitations, we introduce Storchastic, a new framework for AD of stochastic computation graphs. Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Furthermore, Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates. Finally, we implement Storchastic as a PyTorch library at https://github.com/HEmile/storchastic.Comment: 30 pages, 2 figures, 1 table, accepted in NeurIPS 202

arXiv.org e-Print Archive

VU Research Portal