7 research outputs found

    FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

    Get PDF
    Background Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA\u27s on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs)

    Phylogenetic Inference via Sequential Monte Carlo

    Get PDF
    Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of Bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework—which we refer to as PosetSMC—to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC–SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC

    Statistical Methods for Evaluating the Diagnostic Accuracy of Incomplete Multiple Tests

    Get PDF
    The accurate diagnosis of a molecularly-defined subtype of cancer is often a very important step toward its effective prevention and treatment. For the diagnosis of some subtypes of certain cancers, a gold standard with perfect sensitivity and specificity may be unavailable. In those scenarios, the status of the tumor subtype commonly is measured by multiple imperfect diagnostic markers. In many such studies, some subjects are only measured by a subset of diagnostic tests and the missing probabilities may depend on the unknown disease status. In this research, we present novel statistical methods based on an EM algorithm to evaluate incomplete multiple imperfect diagnostic tests under conditional independence and conditional dependence assumptions. We applied the proposed methods to a set of real data from the NCI Colon Cancer Family Registry (C-CFR) on diagnosing microsatellite instability (MSI) for hereditary nonpolyposis colorectal cancer (HNPCC) to estimate diagnostic accuracy (i.e., sensitivities and specificities) and prevalence for 11 biomarker tests. Simulations are conducted to evaluate the small-sample performance of our methods. The advantages and limitations of our methods are discussed. An R package was developed for easy implementation of our methods. Finally, a proposal for future research also was presented.Doctor of Public Healt

    Parallel Markov Chain Monte Carlo

    Get PDF
    The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods

    Parallel Markov Chain Monte Carlo

    Get PDF
    The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods.EThOS - Electronic Theses Online ServiceUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo
    corecore