126,465 research outputs found

    Patterns of Scalable Bayesian Inference

    Full text link
    Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward

    Sequential Bayesian updating for Big Data

    Get PDF
    The velocity, volume, and variety of big data present both challenges and opportunities for cognitive science. We introduce sequential Bayesian updat-ing as a tool to mine these three core properties. In the Bayesian approach, we summarize the current state of knowledge regarding parameters in terms of their posterior distributions, and use these as prior distributions when new data become available. Crucially, we construct posterior distributions in such a way that we avoid having to repeat computing the likelihood of old data as new data become available, allowing the propagation of information without great computational demand. As a result, these Bayesian methods allow continuous inference on voluminous information streams in a timely manner. We illustrate the advantages of sequential Bayesian updating with data from the MindCrowd project, in which crowd-sourced data are used to study Alzheimer’s Dementia. We fit an extended LATER (Linear Ap-proach to Threshold with Ergodic Rate) model to reaction time data from the project in order to separate two distinct aspects of cognitive functioning: speed of information accumulation and caution

    Collaborative Nested Sampling: Big Data vs. complex physical models

    Full text link
    The data torrent unleashed by current and upcoming astronomical surveys demands scalable analysis methods. Many machine learning approaches scale well, but separating the instrument measurement from the physical effects of interest, dealing with variable errors, and deriving parameter uncertainties is often an after-thought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here I present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, the number of physical model evaluations scales sub-linearly with the number of data sets, and no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations need to be made. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.Comment: Resubmitted to PASP Focus on Machine Intelligence in Astronomy and Astrophysics after first referee report. Figure 6 demonstrates the scaling for Collaborative MultiNest, PolyChord and RadFriends implementations. Figure 10 application to MUSE IFU data. Implementation at https://github.com/JohannesBuchner/massivedatan
    • …
    corecore