196 research outputs found
Bregman divergence as general framework to estimate unnormalized statistical models
We show that the Bregman divergence provides a rich framework to estimate
unnormalized statistical models for continuous or discrete random variables,
that is, models which do not integrate or sum to one, respectively. We prove
that recent estimation methods such as noise-contrastive estimation, ratio
matching, and score matching belong to the proposed framework, and explain
their interconnection based on supervised learning. Further, we discuss the
role of boosting in unsupervised learning
Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Within path sampling framework, we show that probability distribution
divergences, such as the Chernoff information, can be estimated via
thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to
different Hamiltonians is implemented to derive tempered transitions along the
path, linking the distributions of interest at the endpoints. Under this
perspective, a geometric approach is feasible, which prompts intuition and
facilitates tuning the error sources. Additionally, there are direct
applications in Bayesian model evaluation. Existing marginal likelihood and
Bayes factor estimators are reviewed here along with their stepping-stone
sampling analogues. New estimators are presented and the use of compound paths
is introduced
Self-Adapting Noise-Contrastive Estimation for Energy-Based Models
Training energy-based models (EBMs) with noise-contrastive estimation (NCE)
is theoretically feasible but practically challenging. Effective learning
requires the noise distribution to be approximately similar to the target
distribution, especially in high-dimensional domains. Previous works have
explored modelling the noise distribution as a separate generative model, and
then concurrently training this noise model with the EBM. While this method
allows for more effective noise-contrastive estimation, it comes at the cost of
extra memory and training complexity. Instead, this thesis proposes a
self-adapting NCE algorithm which uses static instances of the EBM along its
training trajectory as the noise distribution. During training, these static
instances progressively converge to the target distribution, thereby
circumventing the need to simultaneously train an auxiliary noise model.
Moreover, we express this self-adapting NCE algorithm in the framework of
Bregman divergences and show that it is a generalization of maximum likelihood
learning for EBMs. The performance of our algorithm is evaluated across a range
of noise update intervals, and experimental results show that shorter update
intervals are conducive to higher synthesis quality.Comment: MSc thesis submitted to Tsinghua University in July 202
Quasi-Arithmetic Mixtures, Divergence Minimization, and Bregman Information
Markov Chain Monte Carlo methods for sampling from complex distributions and
estimating normalization constants often simulate samples from a sequence of
intermediate distributions along an annealing path, which bridges between a
tractable initial distribution and a target density of interest. Prior work has
constructed annealing paths using quasi-arithmetic means, and interpreted the
resulting intermediate densities as minimizing an expected divergence to the
endpoints. We provide a comprehensive analysis of this 'centroid' property
using Bregman divergences under a monotonic embedding of the density function,
thereby associating common divergences such as Amari's and Renyi's
-divergences, -divergences, and the Jensen-Shannon
divergence with intermediate densities along an annealing path. Our analysis
highlights the interplay between parametric families, quasi-arithmetic means,
and divergence functions using the rho-tau Bregman divergence framework of
Zhang 2004,2013.Comment: 19 pages + appendix (rewritten + changed title in revision
The Poisson transform for unnormalised statistical models
Contrary to standard statistical models, unnormalised statistical models only
specify the likelihood function up to a constant. While such models are natural
and popular, the lack of normalisation makes inference much more difficult.
Here we show that inferring the parameters of a unnormalised model on a space
can be mapped onto an equivalent problem of estimating the intensity
of a Poisson point process on . The unnormalised statistical model now
specifies an intensity function that does not need to be normalised.
Effectively, the normalisation constant may now be inferred as just another
parameter, at no loss of information. The result can be extended to cover
non-IID models, which includes for example unnormalised models for sequences of
graphs (dynamical graphs), or for sequences of binary vectors. As a
consequence, we prove that unnormalised parameteric inference in non-IID models
can be turned into a semi-parametric estimation problem. Moreover, we show that
the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be
understood as an approximation of the Poisson transform, and extended to
non-IID settings. We use our results to fit spatial Markov chain models of eye
movements, where the Poisson transform allows us to turn a highly non-standard
model into vanilla semi-parametric logistic regression
- …