Search CORE

7 research outputs found

FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

Author: A Stamataki
A Stamatakis
B Minh
C Than
CL Schoch
D Zwickl
DR Robinson
F de Dinechin
F Ronquist
G Altekar
H Fu
H Schmidt
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Williams
Jason D Bakos
JW Spatafora
KH Abed
L Zhuo
M A Suchard
M Binder
ME Alfaro
ML Berbee
N Alachiotis
N Alachiotis
R Bauer
R-C Li
SM Barns
Stephanie Zierke
T Hamada
T Keane
TST Mak
X Feng
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA\u27s on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs)

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Scholar Commons - Institutional Repository of the University of South Carolina

Phylogenetic Inference via Sequential Monte Carlo

Author: Alexandre Bouchard-Côté
Altekar
Andrieu
Beaumont
Bourque
Cannone
Cappé
Carpenter
Crisan
Doob
Douc
Doucet
Doucet
Drummond
Fan
Felsenstein
Felsenstein
Felsenstein
Feng
Gelman
Griffiths
Görür
Huelsenbeck
Huelsenbeck
Huelsenbeck
Iorio
Keane
Kimura
Kitagawa
Kong
Kuhner
Lakner
Lartillot
Li
Liu
Marjoram
Michael I. Jordan
Moral
Moral
Moral
Neal
Newton
Paul
Penny
Rannala
Redelings
Rivas
Robert
Robinson
Saitou
Semple
Siepel
Sriram Sankararaman
Suchard
Swendsen
Tavaré
Teh
Thorne
Tom
Xie
Publication venue: Oxford University Press
Publication date: 01/07/2012
Field of study

Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to Bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of Bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework—which we refer to as PosetSMC—to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC–SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

Statistical Methods for Evaluating the Diagnostic Accuracy of Incomplete Multiple Tests

Author: Zhang Yi
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2013
Field of study

The accurate diagnosis of a molecularly-defined subtype of cancer is often a very important step toward its effective prevention and treatment. For the diagnosis of some subtypes of certain cancers, a gold standard with perfect sensitivity and specificity may be unavailable. In those scenarios, the status of the tumor subtype commonly is measured by multiple imperfect diagnostic markers. In many such studies, some subjects are only measured by a subset of diagnostic tests and the missing probabilities may depend on the unknown disease status. In this research, we present novel statistical methods based on an EM algorithm to evaluate incomplete multiple imperfect diagnostic tests under conditional independence and conditional dependence assumptions. We applied the proposed methods to a set of real data from the NCI Colon Cancer Family Registry (C-CFR) on diagnosing microsatellite instability (MSI) for hereditary nonpolyposis colorectal cancer (HNPCC) to estimate diagnostic accuracy (i.e., sensitivities and specificities) and prevalence for 11 biomarker tests. Simulations are conducted to evaluate the small-sample performance of our methods. The advantages and limitations of our methods are discussed. An R package was developed for easy implementation of our methods. Finally, a proposal for future research also was presented.Doctor of Public Healt

Carolina Digital Repository

Parallel Markov Chain Monte Carlo

Author: Byrd Jonathan M. R.
Publication venue
Publication date
Field of study

The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods

Warwick Research Archives Portal Repository

Parallel Markov Chain Monte Carlo

Author: Byrd Jonathan Michael Robert
Publication venue
Publication date: 01/01/2010
Field of study

OpenGrey Repository