8 research outputs found

    Population-Based MCMC on Multi-Core CPUs, GPUs and FPGAs

    Get PDF

    Algorithms and architectures for MCMC acceleration in FPGAs

    Get PDF
    Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces

    Acceleration of Bayesian model based data analysis

    Get PDF
    Inverse problems for parameter estimation often face a choice between the use of a real-time scheme with strong approximations or rigorous post-processing with explicit uncertainty handling. Plasma physics experiments set a particularly high demand of both and a solution that meets all of these requirements is missing. Standard Bayesian analysis is an excellent tool for the case at hand, with the disadvantage of extensive processing times. This work therefore presents a solution that satisfies the scientific requirements while reducing the need for a speed vs. rigorosity trade-off.Die Bestimmung von Parametern bei inversen Problemen beinhaltet eine Abwägung zwischen vereinfachenden Annahmen für Echtzeitverfahren und rigoroser Datenanalyse mit Fehlerbetrachtung. Experimente in der Plasmaphysik stellen besonders hohe Anforderungen an beide, und eine Lösung, die diese Anforderungen erfüllt, fehlt. Die Bayessche Analyse ist ein exzellentes Werkzeug für diese Problemstellung, mit dem Nachteil langer Laufzeiten. Diese Arbeit stellt eine Lösung dar, die den Anforderungen entspricht und die Notwendigkeit der Abwägung zwischen Geschwindigkeit und Rigorosität reduziert

    Parallel resampling in the particle filter

    Full text link
    Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation.Comment: 21 pages, 6 figure

    A Data-Analysis and Sensitivity-Optimization Framework for the KATRIN Experiment

    Get PDF
    Presently under construction, the Karlsruhe TRitium Neutrino (KATRIN) experiment is the next generation tritium beta-decay experiment to perform a direct kinematical measurement of the electron neutrino mass with an unprecedented sensitivity of 200 meV (90% C.L.). This thesis describes the implementation of a consistent data analysis framework, addressing technical aspects of the data taking process and statistical challenges of a neutrino mass estimation from the beta-decay electron spectrum

    Form vs. Function: Theory and Models for Neuronal Substrates

    Get PDF
    The quest for endowing form with function represents the fundamental motivation behind all neural network modeling. In this thesis, we discuss various functional neuronal architectures and their implementation in silico, both on conventional computer systems and on neuromorpic devices. Necessarily, such casting to a particular substrate will constrain their form, either by requiring a simplified description of neuronal dynamics and interactions or by imposing physical limitations on important characteristics such as network connectivity or parameter precision. While our main focus lies on the computational properties of the studied models, we augment our discussion with rigorous mathematical formalism. We start by investigating the behavior of point neurons under synaptic bombardment and provide analytical predictions of single-unit and ensemble statistics. These considerations later become useful when moving to the functional network level, where we study the effects of an imperfect physical substrate on the computational properties of several cortical networks. Finally, we return to the single neuron level to discuss a novel interpretation of spiking activity in the context of probabilistic inference through sampling. We provide analytical derivations for the translation of this ``neural sampling'' framework to networks of biologically plausible and hardware-compatible neurons and later take this concept beyond the realm of brain science when we discuss applications in machine learning and analogies to solid-state systems
    corecore