MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

A Stamatakis; A Stamatakis; A Stamatakis; AE Darling; Alexander J. Ropelewski; AP Stamatakis; B Reva; BQ Minh; CA Russo; CJ Douady; D Durand; D Durand; D Kordis; DA Janies; E Mayr; F Delsuc; FD Ciccarelli; FR Opperdoes; G Altekar; G Talavera; GJ Olsen; HA Schmidt; Hugh B. Nicholas; I. King Jordan; J Felsenstein; J Felsenstein; J Felsenstein; J Hempel; J Perozich; JA Sheps; JT Bridgham; K Hamacher; K Tamura; KB Li; MJ Sanderson; Ricardo R. Gonzalez Mendez; S Sankararaman; S Yang; SB Hedges; SL Kosakovsky-Pond; T Wymore; TL Williams; TM Keane; TM Keane

MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

Authors: A Stamatakis
A Stamatakis
A Stamatakis
AE Darling
Alexander J. Ropelewski
AP Stamatakis
B Reva
BQ Minh
CA Russo
CJ Douady
D Durand
D Durand
D Kordis
DA Janies
E Mayr
F Delsuc
FD Ciccarelli
FR Opperdoes
G Altekar
G Talavera
GJ Olsen
HA Schmidt
Hugh B. Nicholas
I. King Jordan
J Felsenstein
J Felsenstein
J Felsenstein
J Hempel
J Perozich
JA Sheps
JT Bridgham
K Hamacher
K Tamura
KB Li
MJ Sanderson
Ricardo R. Gonzalez Mendez
S Sankararaman
S Yang
SB Hedges
SL Kosakovsky-Pond
T Wymore
TL Williams
TM Keane
TM Keane
Publication date: 1 January 2010
Publisher: Public Library of Science
Doi

Abstract

Background: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. Methodology: Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. Conclusions: Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to mor

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Directory of Open Access Journals

oai:doaj.org/article:725716e67...

Last time updated on 13/10/2017

Crossref

Last time updated on 05/06/2019

Public Library of Science (PLOS)

Last time updated on 05/06/2019