859 research outputs found
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
A central problem in machine learning involves modeling complex data-sets
using highly flexible families of probability distributions in which learning,
sampling, inference, and evaluation are still analytically or computationally
tractable. Here, we develop an approach that simultaneously achieves both
flexibility and tractability. The essential idea, inspired by non-equilibrium
statistical physics, is to systematically and slowly destroy structure in a
data distribution through an iterative forward diffusion process. We then learn
a reverse diffusion process that restores structure in data, yielding a highly
flexible and tractable generative model of the data. This approach allows us to
rapidly learn, sample from, and evaluate probabilities in deep generative
models with thousands of layers or time steps, as well as to compute
conditional and posterior probabilities under the learned model. We
additionally release an open source reference implementation of the algorithm
Training Energy-Based Models with Diffusion Contrastive Divergences
Energy-Based Models (EBMs) have been widely used for generative modeling.
Contrastive Divergence (CD), a prevailing training objective for EBMs, requires
sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which
leads to an irreconcilable trade-off between the computational burden and the
validity of the CD. Running MCMCs till convergence is computationally
intensive. On the other hand, short-run MCMC brings in an extra non-negligible
parameter gradient term that is difficult to handle. In this paper, we provide
a general interpretation of CD, viewing it as a special instance of our
proposed Diffusion Contrastive Divergence (DCD) family. By replacing the
Langevin dynamic used in CD with other EBM-parameter-free diffusion processes,
we propose a more efficient divergence. We show that the proposed DCDs are both
more computationally efficient than the CD and are not limited to a
non-negligible gradient term. We conduct intensive experiments, including both
synthesis data modeling and high-dimensional image denoising and generation, to
show the advantages of the proposed DCDs. On the synthetic data learning and
image denoising experiments, our proposed DCD outperforms CD by a large margin.
In image generation experiments, the proposed DCD is capable of training an
energy-based model for generating the Celab-A dataset, which is
comparable to existing EBMs
Geometric Inference in Bayesian Hierarchical Models with Applications to Topic Modeling
Unstructured data is available in abundance with the rapidly growing size of digital information. Labeling such data is expensive and impractical, making unsupervised learning an increasingly important field. Big data collections often have rich latent structure that statistical modeler is challenged to uncover. Bayesian hierarchical modeling is a particularly suitable approach for complex latent patterns. Graphical model formalism has been prominent in developing various procedures for inference in Bayesian models, however the corresponding computational limits often fall behind the demands of the modern data sizes. In this thesis we develop new approaches for scalable approximate Bayesian inference. In particular, our approaches are driven by the analysis of latent geometric structures induced by the models.
Our specific contributions include the following. We develop full geometric recipe of the Latent Dirichlet Allocation topic model. Next, we study several approaches for exploiting the latent geometry to first arrive at a fast weighted clustering procedure augmented with geometric corrections for topic inference, and then a nonparametric approach based on the analysis of the concentration of mass and angular geometry of the topic simplex, a convex polytope constructed by taking the convex hull of vertices representing the latent topics. Estimates produced by our methods are shown to be statistically consistent under some conditions. Finally, we develop a series of models for temporal dynamics of the latent geometric structures where inference can be performed in online and distributed fashion. All our algorithms are evaluated with extensive experiments on simulated and real datasets, culminating at a method several orders of magnitude faster than existing state-of-the-art topic modeling approaches, as demonstrated by experiments working with several million documents in a dozen minutes.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146051/1/moonfolk_1.pd
Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Energy-based models are a simple yet powerful class of probabilistic models,
but their widespread adoption has been limited by the computational burden of
training them. We propose a novel loss function called Energy Discrepancy (ED)
which does not rely on the computation of scores or expensive Markov chain
Monte Carlo. We show that ED approaches the explicit score matching and
negative log-likelihood loss under different limits, effectively interpolating
between both. Consequently, minimum ED estimation overcomes the problem of
nearsightedness encountered in score-based estimation methods, while also
enjoying theoretical guarantees. Through numerical experiments, we demonstrate
that ED learns low-dimensional data distributions faster and more accurately
than explicit score matching or contrastive divergence. For high-dimensional
image data, we describe how the manifold hypothesis puts limitations on our
approach and demonstrate the effectiveness of energy discrepancy by training
the energy-based model as a prior of a variational decoder model
Efficient Methods for Unsupervised Learning of Probabilistic Models
In this thesis I develop a variety of techniques to train, evaluate, and
sample from intractable and high dimensional probabilistic models. Abstract
exceeds arXiv space limitations -- see PDF
- …