105 research outputs found
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Model-Parallel Inference for Big Topic Models
In real world industrial applications of topic modeling, the ability to
capture gigantic conceptual space by learning an ultra-high dimensional topical
representation, i.e., the so-called "big model", is becoming the next
desideratum after enthusiasms on "big data", especially for fine-grained
downstream tasks such as online advertising, where good performances are
usually achieved by regression-based predictors built on millions if not
billions of input features. The conventional data-parallel approach for
training gigantic topic models turns out to be rather inefficient in utilizing
the power of parallelism, due to the heavy dependency on a centralized image of
"model". Big model size also poses another challenge on the storage, where
available model size is bounded by the smallest RAM of nodes. To address these
issues, we explore another type of parallelism, namely model-parallelism, which
enables training of disjoint blocks of a big topic model in parallel. By
integrating data-parallelism with model-parallelism, we show that dependencies
between distributed elements can be handled seamlessly, achieving not only
faster convergence but also an ability to tackle significantly bigger model
size. We describe an architecture for model-parallel inference of LDA, and
present a variant of collapsed Gibbs sampling algorithm tailored for it.
Experimental results demonstrate the ability of this system to handle topic
modeling with unprecedented amount of 200 billion model variables only on a
low-end cluster with very limited computational resources and bandwidth
Algorithms and architectures for MCMC acceleration in FPGAs
Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces
- …