15 research outputs found

    Acceleration of MCMC-based algorithms using reconfigurable logic

    Get PDF
    Monte Carlo (MC) methods such as Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) have emerged as popular tools to sample from high dimensional probability distributions. Because these algorithms can draw samples effectively from arbitrary distributions in Bayesian inference problems, they have been widely used in a range of statistical applications. However, they are often too time consuming due to the prohibitive costly likelihood evaluations, thus they cannot be practically applied to complex models with large-scale datasets. Currently, the lack of sufficiently fast MCMC methods limits their applicability in many modern applications such as genetics and machine learning, and this situation is bound to get worse given the increasing adoption of big data in many fields. The objective of this dissertation is to develop, design and build efficient hardware architectures for MCMC-based algorithms on Field Programmable Gate Arrays (FPGAs), and thereby bring them closer to practical applications. The contributions of this work include: 1) Novel parallel FPGA architectures of the state-of-the-art resampling algorithms for SMC methods. The proposed architectures allow for parallel implementations and thus improve the processing speed. 2) A novel mixed precision MCMC algorithm, along with a tailored FPGA architecture. The proposed design allows for more parallelism and achieves low latency for a given set of hardware resources, while still guaranteeing unbiased estimates. 3) A new variant of subsampling MCMC method based on unequal probability sampling, along with a highly optimized FPGA architecture. The proposed method significantly reduces off-chip memory access and achieves high accuracy in estimates for a given time budget. This work has resulted in the development of hardware accelerators of MCMC and SMC for very large-scale Bayesian tasks by applying the above techniques. Notable speed improvements compared to the respective state-of-the-art CPU and GPU implementations have been achieved in this work.Open Acces

    Algorithms and architectures for MCMC acceleration in FPGAs

    Get PDF
    Markov Chain Monte Carlo (MCMC) is a family of stochastic algorithms which are used to draw random samples from arbitrary probability distributions. This task is necessary to solve a variety of problems in Bayesian modelling, e.g. prediction and model comparison, making MCMC a fundamental tool in modern statistics. Nevertheless, due to the increasing complexity of Bayesian models, the explosion in the amount of data they need to handle and the computational intensity of many MCMC algorithms, performing MCMC-based inference is often impractical in real applications. This thesis tackles this computational problem by proposing Field Programmable Gate Array (FPGA) architectures for accelerating MCMC and by designing novel MCMC algorithms and optimization methodologies which are tailored for FPGA implementation. The contributions of this work include: 1) An FPGA architecture for the Population-based MCMC algorithm, along with two modified versions of the algorithm which use custom arithmetic precision in large parts of the implementation without introducing error in the output. Mapping the two modified versions to an FPGA allows for more parallel modules to be instantiated in the same chip area. 2) An FPGA architecture for the Particle MCMC algorithm, along with a novel algorithm which combines Particle MCMC and Population-based MCMC to tackle multi-modal distributions. A proposed FPGA architecture for the new algorithm achieves higher datapath utilization than the Particle MCMC architecture. 3) A generic method to optimize the arithmetic precision of any MCMC algorithm that is implemented on FPGAs. The method selects the minimum precision among a given set of precisions, while guaranteeing a user-defined bound on the output error. By applying the above techniques to large-scale Bayesian problems, it is shown that significant speedups (one or two orders of magnitude) are possible compared to state-of-the-art MCMC algorithms implemented on CPUs and GPUs, opening the way for handling complex statistical analyses in the era of ubiquitous, ever-increasing data.Open Acces

    F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition

    Get PDF
    Three-dimensional convolutional neural networks (3D CNNs) have demonstrated their outstanding classification accuracy for human action recognition (HAR). However, the large number of computations and parameters in 3D CNNs limits their deployability in real-life applications. To address this challenge, this paper adopts an algorithm-hardware co-design method by proposing an efficient 3D CNN building unit called 3D-1 bottleneck residual block (3D-1 BRB) at the algorithm level, and a corresponding FPGA-based hardware architecture called F-E3D at hardware level. Based on 3D-1 BRB, a novel 3D CNN model called E3DNet is developed, which achieves nearly 37 times reduction in model size and 5% improvement in accuracy compared to standard 3D CNNs on the UCF101 dataset. Together with several hardware optimizations, including 3D fused BRB, online blocking and kernel reuse, the proposed F-E3D is nearly 13 times faster than a previous FPGA design for 3D CNNs, with performance and accuracy comparable to other state-of-the-art 3D CNN models on GPU platforms while requiring only 7% of their energy consumption

    Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA

    Get PDF
    Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive which limits their applicability to real time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms which have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space in order to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 GOPS under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization, and supports up to 17 frames per second for 512x512 image inputs with a power consumption of only 9.6W

    A Self-adaptive Fireworks Algorithm for Classification Problems

    Get PDF
    his work was supported in part by the National Natural Science Foundation of China under Grants 61403206 and 61771258, in part by the Natural Science Foundation of Jiangsu Province under Grants BK20141005 and BK20160910, in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 14KJB520025, in part by the Priority Academic Program Development of Jiangsu Higher Education Institutions, in part by the Open Research Fund of Jiangsu Engineering Research Center of Communication and Network Technology, NJUPT, under Grant JSGCZX17001, and in part by the Shaanxi Key Laboratory of Complex System Control and Intelligent Information Processing, Xi’an University of Technology, under Contract SKL2017CP01.Peer reviewedPublisher PD

    Analysis tools and methods for tritium data taking with the KATRIN experiment

    Get PDF
    Das KArlsruhe TRItium Neutrino (KATRIN) Experiment ist konstruiert, um die Neutrinomasse mit einer bisher unerreichten Sensitivität von 200 meV (90 % C.L.) zu bestimmen. Dafür kombiniert KATRIN eine starke gasförmige Tritiumquelle mit einem hochauflösendem MAC-E-Filter-Spektrometer. Seit der Fertigstellung des experimentellen Aufbaus im Herbst 2016 hat das KATRIN-Experiment mehrere zentrale Meilensteine erreicht. In der vorliegenden Arbeit wird ein Gasmodell für den letzten dieser Meilensteine - die erste Tritium-Zirkulation in der Tritium-Quelle im Mai und Juni 2018 - vorgestellt, und mit unterschiedlichen Messmethoden verglichen. Mit Hilfe des Gasmodells werden die ersten Tritiumspektren analysiert, wobei verschiedene Methoden zur Berücksichtigung von Systematiken am Beispiel des Gasmodells untersucht werden. Vorbereitend auf die kommenden Neutrinomassendaten werden in dieser Arbeit Methoden für eine blinde Neutrinomassenanalyse von β\beta-Zerfall-basierten Tritiumspektren vorgestellt, miteinander verglichen, und an Hand von Kryptondaten getestet. Die Arbeit schließt mit einem Ausblick auf das Potential von KATRIN, Physik jenseits der Neutrinomasse zu untersuchen. Für drei Beispiele wird exemplarisch die statistische Sensitivität von KATRIN bestimmt und mit bestehenden experimentellen Erkenntnissen verglichen

    An unbiased MCMC FPGA-based accelerator in the land of custom precision arithmetic

    Get PDF
    Markov Chain Monte Carlo (MCMC) based methods have been the main tool used for Bayesian Inference by practitioners and researchers due to their flexibility and theoretical properties that guarantee unbiased sampling-based estimates. Nevertheless, with the availability of large data sets and the constant need to develop more complex models that better capture the targeted problem, significant computational challenges have been presented. Current approaches, based on multi-core CPUs, GPUs, and FPGAs, aim to accelerate the execution time of the MCMC methods using subsampling techniques or custom precision arithmetic, resulting to biased estimates. In this work, a novel FPGA-based construction is proposed that utilises the custom precision support of FPGA devices in order to accelerate the computations, guaranteeing at the same time asymptotically unbiased estimates. Key to this approach is the extension of the parameter space by an extra parameter that indicates the required precision in the computation of the likelihood of a data point. The work proposes an FPGA architecture for the above algorithm, as well as discuss its tuning for maximising the performance of the system. The performance of the FPGA-mapped sampler is evaluated using two Bayesian logistic regression case studies of varying complexity, which show significant speedups compared to existing FPGAand CPU-based works that utilise double floating point arithmetic, without any bias on the sampling-based estimates

    An Unbiased MCMC FPGA-Based Accelerator in the Land of Custom Precision Arithmetic

    No full text
    corecore