126,465 research outputs found
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Sequential Bayesian updating for Big Data
The velocity, volume, and variety of big data present both challenges and opportunities for cognitive science. We introduce sequential Bayesian updat-ing as a tool to mine these three core properties. In the Bayesian approach, we summarize the current state of knowledge regarding parameters in terms of their posterior distributions, and use these as prior distributions when new data become available. Crucially, we construct posterior distributions in such a way that we avoid having to repeat computing the likelihood of old data as new data become available, allowing the propagation of information without great computational demand. As a result, these Bayesian methods allow continuous inference on voluminous information streams in a timely manner. We illustrate the advantages of sequential Bayesian updating with data from the MindCrowd project, in which crowd-sourced data are used to study Alzheimer’s Dementia. We fit an extended LATER (Linear Ap-proach to Threshold with Ergodic Rate) model to reaction time data from the project in order to separate two distinct aspects of cognitive functioning: speed of information accumulation and caution
Collaborative Nested Sampling: Big Data vs. complex physical models
The data torrent unleashed by current and upcoming astronomical surveys
demands scalable analysis methods. Many machine learning approaches scale well,
but separating the instrument measurement from the physical effects of
interest, dealing with variable errors, and deriving parameter uncertainties is
often an after-thought. Classic forward-folding analyses with Markov Chain
Monte Carlo or Nested Sampling enable parameter estimation and model
comparison, even for complex and slow-to-evaluate physical models. However,
these approaches require independent runs for each data set, implying an
unfeasible number of model evaluations in the Big Data regime. Here I present a
new algorithm, collaborative nested sampling, for deriving parameter
probability distributions for each observation. Importantly, the number of
physical model evaluations scales sub-linearly with the number of data sets,
and no assumptions about homogeneous errors, Gaussianity, the form of the model
or heterogeneity/completeness of the observations need to be made.
Collaborative nested sampling has immediate application in speeding up analyses
of large surveys, integral-field-unit observations, and Monte Carlo
simulations.Comment: Resubmitted to PASP Focus on Machine Intelligence in Astronomy and
Astrophysics after first referee report. Figure 6 demonstrates the scaling
for Collaborative MultiNest, PolyChord and RadFriends implementations. Figure
10 application to MUSE IFU data. Implementation at
https://github.com/JohannesBuchner/massivedatan
- …