8 research outputs found
DISCO Nets: DISsimilarity COefficient Networks
We present a new type of probabilistic model which we call DISsimilarity
COefficient Networks (DISCO Nets). DISCO Nets allow us to efficiently sample
from a posterior distribution parametrised by a neural network. During
training, DISCO Nets are learned by minimising the dissimilarity coefficient
between the true distribution and the estimated distribution. This allows us to
tailor the training to the loss related to the task at hand. We empirically
show that (i) by modeling uncertainty on the output value, DISCO Nets
outperform equivalent non-probabilistic predictive networks and (ii) DISCO Nets
accurately model the uncertainty of the output, outperforming existing
probabilistic models based on deep neural networks
Statistical inference for generative models with maximum mean discrepancy
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models
Statistical Inference for Generative Models with Maximum Mean Discrepancy
While likelihood-based inference and its variants provide a statistically
efficient and widely applicable approach to parametric inference, their
application to models involving intractable likelihoods poses challenges. In
this work, we study a class of minimum distance estimators for intractable
generative models, that is, statistical models for which the likelihood is
intractable, but simulation is cheap. The distance considered, maximum mean
discrepancy (MMD), is defined through the embedding of probability measures
into a reproducing kernel Hilbert space. We study the theoretical properties of
these estimators, showing that they are consistent, asymptotically normal and
robust to model misspecification. A main advantage of these estimators is the
flexibility offered by the choice of kernel, which can be used to trade-off
statistical efficiency and robustness. On the algorithmic side, we study the
geometry induced by MMD on the parameter space and use this to introduce a
novel natural gradient descent-like algorithm for efficient implementation of
these estimators. We illustrate the relevance of our theoretical results on
several classes of models including a discrete-time latent Markov process and
two multivariate stochastic differential equation models
Statistical computation with kernels
Modern statistical inference has seen a tremendous increase in the size and complexity of models and datasets. As such, it has become reliant on advanced com- putational tools for implementation. A first canonical problem in this area is the numerical approximation of integrals of complex and expensive functions. Numerical integration is required for a variety of tasks, including prediction, model comparison and model choice. A second canonical problem is that of statistical inference for models with intractable likelihoods. These include models with intractable normal- isation constants, or models which are so complex that their likelihood cannot be evaluated, but from which data can be generated. Examples include large graphical models, as well as many models in imaging or spatial statistics.
This thesis proposes to tackle these two problems using tools from the kernel methods and Bayesian non-parametrics literature. First, we analyse a well-known algorithm for numerical integration called Bayesian quadrature, and provide consis- tency and contraction rates. The algorithm is then assessed on a variety of statistical inference problems, and extended in several directions in order to reduce its compu- tational requirements. We then demonstrate how the combination of reproducing kernels with Stein’s method can lead to computational tools which can be used with unnormalised densities, including numerical integration and approximation of probability measures. We conclude by studying two minimum distance estimators derived from kernel-based statistical divergences which can be used for unnormalised and generative models.
In each instance, the tractability provided by reproducing kernels and their properties allows us to provide easily-implementable algorithms whose theoretical foundations can be studied in depth
Statistical computation with kernels
Modern statistical inference has seen a tremendous increase in the size and complexity of models and datasets. As such, it has become reliant on advanced computational tools for implementation. A first canonical problem in this area is the numerical approximation of integrals of complex and expensive functions. Numerical integration is required for a variety of tasks, including prediction, model comparison and model choice. A second canonical problem is that of statistical inference for models with intractable likelihoods. These include models with intractable normalisation constants, or models which are so complex that their likelihood cannot be evaluated, but from which data can be generated. Examples include large graphical models, as well as many models in imaging or spatial statistics.
This thesis proposes to tackle these two problems using tools from the kernel methods and Bayesian non-parametrics literature. First, we analyse a well-known algorithm for numerical integration called Bayesian quadrature, and provide consistency and contraction rates. The algorithm is then assessed on a variety of statistical inference problems, and extended in several directions in order to reduce its computational requirements. We then demonstrate how the combination of reproducing kernels with Stein's method can lead to computational tools which can be used with unnormalised densities, including numerical integration and approximation of probability measures. We conclude by studying two minimum distance estimators derived from kernel-based statistical divergences which can be used for unnormalised and generative models.
In each instance, the tractability provided by reproducing kernels and their properties allows us to provide easily-implementable algorithms whose theoretical foundations can be studied in depth
Nonparametric Scoring Rules
A scoring rule is a device for eliciting and assessing probabilistic forecasts from an agent. When dealing with continuous outcome spaces, and absent any prior insights into the structure of the agent's beliefs, the rule should allow for a flexible reporting interface that can accurately represent complicated, multi-modal distributions. In this paper, we provide such a scoring rule based on a nonparametric approach of eliciting a set of samples from the agent and efficiently evaluating the score using kernel methods. We prove that sampled reports of increasing size converge rapidly to the true score, and that sampled reports are approximately optimal. We also demonstrate a connection between the scoring rule and the maximum mean discrepancy divergence. Experimental results are provided that confirm rapid convergence and that the expected score correlates well with standard notions of divergence, both important considerations for ensuring that agents are incentivized to report accurate information