8 research outputs found
Algorithms and architectures for MCMC acceleration in FPGAs
Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces
Acceleration of Bayesian model based data analysis
Inverse problems for parameter estimation often face a choice between the use of a real-time scheme with strong approximations or rigorous post-processing with explicit uncertainty handling. Plasma physics experiments set a particularly high demand of both and a solution that meets all of these requirements is missing. Standard Bayesian analysis is an excellent tool for the case at hand, with the disadvantage of extensive processing times. This work therefore presents a solution that satisfies the scientific requirements while reducing the need for a speed vs. rigorosity trade-off.Die Bestimmung von Parametern bei inversen Problemen beinhaltet eine Abwägung zwischen vereinfachenden Annahmen für Echtzeitverfahren und rigoroser Datenanalyse mit Fehlerbetrachtung. Experimente in der Plasmaphysik stellen besonders hohe Anforderungen an beide, und eine Lösung, die diese Anforderungen erfüllt, fehlt. Die Bayessche Analyse ist ein exzellentes Werkzeug für diese Problemstellung, mit dem Nachteil langer Laufzeiten. Diese Arbeit stellt eine Lösung dar, die den Anforderungen entspricht und die Notwendigkeit der Abwägung zwischen Geschwindigkeit und Rigorosität reduziert
Parallel resampling in the particle filter
Modern parallel computing devices, such as the graphics processing unit
(GPU), have gained significant traction in scientific and statistical
computing. They are particularly well-suited to data-parallel algorithms such
as the particle filter, or more generally Sequential Monte Carlo (SMC), which
are increasingly used in statistical inference. SMC methods carry a set of
weighted particles through repeated propagation, weighting and resampling
steps. The propagation and weighting steps are straightforward to parallelise,
as they require only independent operations on each particle. The resampling
step is more difficult, as standard schemes require a collective operation,
such as a sum, across particle weights. Focusing on this resampling step, we
analyse two alternative schemes that do not involve a collective operation
(Metropolis and rejection resamplers), and compare them to standard schemes
(multinomial, stratified and systematic resamplers). We find that, in certain
circumstances, the alternative resamplers can perform significantly faster on a
GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover,
in single precision, the standard approaches are numerically biased for upwards
of hundreds of thousands of particles, while the alternatives are not. This is
particularly important given greater single- than double-precision throughput
on modern devices, and the consequent temptation to use single precision with a
greater number of particles. Finally, we provide auxiliary functions useful for
implementation, such as for the permutation of ancestry vectors to enable
in-place propagation.Comment: 21 pages, 6 figure
A Data-Analysis and Sensitivity-Optimization Framework for the KATRIN Experiment
Presently under construction, the Karlsruhe TRitium Neutrino (KATRIN) experiment is the next generation tritium beta-decay experiment to perform a direct kinematical measurement of the electron neutrino mass with an unprecedented sensitivity of 200 meV (90% C.L.). This thesis describes the implementation of a consistent data analysis framework, addressing technical aspects of the data taking process and statistical challenges of a neutrino mass estimation from the beta-decay electron spectrum
Recommended from our members
Laboratory Directed Research and Development Program FY 2004 Annual Report
The Oak Ridge National Laboratory (ORNL) Laboratory Directed Research and Development (LDRD) Program reports its status to the U.S. Department of Energy (DOE) in March of each year. The program operates under the authority of DOE Order 413.2A, 'Laboratory Directed Research and Development' (January 8, 2001), which establishes DOE's requirements for the program while providing the Laboratory Director broad flexibility for program implementation. LDRD funds are obtained through a charge to all Laboratory programs. This report describes all ORNL LDRD research activities supported during FY 2004 and includes final reports for completed projects and shorter progress reports for projects that were active, but not completed, during this period. The FY 2004 ORNL LDRD Self-Assessment (ORNL/PPA-2005/2) provides financial data about the FY 2004 projects and an internal evaluation of the program's management process. ORNL is a DOE multiprogram science, technology, and energy laboratory with distinctive capabilities in materials science and engineering, neutron science and technology, energy production and end-use technologies, biological and environmental science, and scientific computing. With these capabilities ORNL conducts basic and applied research and development (R&D) to support DOE's overarching national security mission, which encompasses science, energy resources, environmental quality, and national nuclear security. As a national resource, the Laboratory also applies its capabilities and skills to the specific needs of other federal agencies and customers through the DOE Work For Others (WFO) program. Information about the Laboratory and its programs is available on the Internet at <http://www.ornl.gov/>. LDRD is a relatively small but vital DOE program that allows ORNL, as well as other multiprogram DOE laboratories, to select a limited number of R&D projects for the purpose of: (1) maintaining the scientific and technical vitality of the Laboratory; (2) enhancing the Laboratory's ability to address future DOE missions; (3) fostering creativity and stimulating exploration of forefront science and technology; (4) serving as a proving ground for new research; and (5) supporting high-risk, potentially high-value R&D. Through LDRD the Laboratory is able to improve its distinctive capabilities and enhance its ability to conduct cutting-edge R&D for its DOE and WFO sponsors. To meet the LDRD objectives and fulfill the particular needs of the Laboratory, ORNL has established a program with two components: the Director's R&D Fund and the Seed Money Fund. As outlined in Table 1, these two funds are complementary. The Director's R&D Fund develops new capabilities in support of the Laboratory initiatives, while the Seed Money Fund is open to all innovative ideas that have the potential for enhancing the Laboratory's core scientific and technical competencies. Provision for multiple routes of access to ORNL LDRD funds maximizes the likelihood that novel and seminal ideas with scientific and technological merit will be recognized and supported
Form vs. Function: Theory and Models for Neuronal Substrates
The quest for endowing form with function represents the fundamental motivation behind all neural network modeling. In this thesis, we discuss various functional neuronal architectures and their implementation in silico, both on conventional computer systems and on neuromorpic devices. Necessarily, such casting to a particular substrate will constrain their form, either by requiring a simplified description of neuronal dynamics and interactions or by imposing physical limitations on important characteristics such as network connectivity or parameter precision. While our main focus lies on the computational properties of the studied models, we augment our discussion with rigorous mathematical formalism. We start by investigating the behavior of point neurons under synaptic bombardment and provide analytical predictions of single-unit and ensemble statistics. These considerations later become useful when moving to the functional network level, where we study the effects of an imperfect physical substrate on the computational properties of several cortical networks. Finally, we return to the single neuron level to discuss a novel interpretation of spiking activity in the context of probabilistic inference through sampling. We provide analytical derivations for the translation of this ``neural sampling'' framework to networks of biologically plausible and hardware-compatible neurons and later take this concept beyond the realm of brain science when we discuss applications in machine learning and analogies to solid-state systems