Search CORE

1,074 research outputs found

MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation

Author: Flouri Tomáš
Haeseler Arndt von
Hoang Diep Thi
Minh Bui Quang
Stamatakis Alexandros
Vinh Le Sy
Publication venue: BioMed Central
Publication date: 27/06/2018
Field of study

Phylogenetic Trees and Their Analysis

Author: Ford Eric
Publication venue: CUNY Academic Works
Publication date: 01/02/2014
Field of study

Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant\u27s Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with perfect data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data

City University of New York

Edge Ratchet and Simulated Annealing to Improve RF Score of the Supertree of Life

Author: Manshouri Reza
Publication venue
Publication date: 21/09/2018
Field of study

Constructing the Supertree of Life can provide crucially valuable knowledge to address many critical contemporary challenges such as fighting diseases, improving global agriculture, and protecting ecosystems to name a few. However, building such a tree is among the most complicated and challenging scientific problems. In the case of biological data, the true species tree is not available. Hence, the accuracy of the supertree is usually evaluated based on its similarity to the given source input trees. In this work, we aim at improving the accuracy of the supertree in terms of its cumulative Robinson Foulds (RF) distance to the source trees. This problem is NP-hard. Therefore, we have to resort to heuristic algorithms. We have two main contributions in this work. First, we propose a new technique, Edge Ratchet, which is used in a hill-climbing based algorithm to deal with local optimum problem. Second, we develop a Simulated Annealing algorithm to minimize total RF distance of the supertree to the source trees. Our results demonstrate that these two algorithms are able to improve the accuracy of the best existing supertree algorithms with regard to RF distance

Texas A&M Repository

Recommended from our members

Representation in stochastic search for phylogenetictree reconstruction

Author: Ohno-Machado Lucila
Shieber Stuart
Weber Griffin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Phylogenetic tree reconstruction is a process in which the ancestral relationships among a group of organisms are inferred from their DNA sequences. For all but trivial sized data sets, ﬁnding the optimal tree is computationally intractable. Many heuristic algorithms exist, but the branch-swapping algorithm used in the software package PAUP is the most popular. This method performs a stochastic search over the space of trees, using a branch-swapping operation to construct neighboring trees in the search space. This study introduces a new stochastic search algorithm that operates over an alternative representation of trees, namely as permutations of taxa giving the order in which they are processed during stepwise addition. Experiments on several data sets suggest that this algorithm for generating an initial tree, when followed by branch-swapping, can produce better trees for a given total amount of time.Engineering and Applied Science

Harvard University - DASH

Robinson-Foulds Supertrees

Author: David Fernández-baca
J Gordon Burleigh
Mukul S Bansal
Oliver Eulenstein
Robinson-foulds Supertrees
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades) from the input trees. Results We introduce efficient, local search based, hill-climbing heuristics for the intrinsically hard RF supertree problem on rooted trees. These heuristics use novel non-trivial algorithms for the SPR and TBR local search problems which improve on the time complexity of the best known (naïve) solutions by a factor of Θ(<it>n</it>) and Θ(<it>n</it>2) respectively (where <it>n </it>is the number of taxa, or leaves, in the supertree). We use an implementation of our new algorithms to examine the performance of the RF supertree method and compare it to matrix representation with parsimony (MRP) and the triplet supertree method using four supertree data sets. Not only did our RF heuristic provide fast estimates of RF supertrees in all data sets, but the RF supertrees also retained more of the information from the input trees (based on the RF distance) than the other supertree methods. Conclusions Our heuristics for the RF supertree problem, based on our new local search algorithms, make it possible for the first time to estimate large supertrees by directly optimizing the RF distance from rooted input trees to the supertrees. This provides a new and fast method to build accurate supertrees. RF supertrees may also be useful for estimating majority-rule(-) supertrees, which are a generalization of majority-rule consensus trees.</p

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

On the use of cartographic projections in visualizing phylo-genetic tree space

Author: A Clark
A Hultman
A Kupczok
A Stamatakis
B Allen
B Chor
B Herring
B Jenkins
C Sing
D Gusfield
D Hillis
DJ Zwickl
DL Swofford
F Ronquist
G Ganapathy
H Carroll
J Keith
J Thompson
K Crandall
Kenneth Sundberg
L Bugayevskiy
LJ Billera
M Chase
M Waterman
Mark Clement
N Amenta
N Amenta
N Pattengale
Quinn Snell
R DeSalle
R Meier
S Guindon
W Basalaj
W Day
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Phylogenetic analysis is becoming an increasingly important tool for biological research. Applications include epidemiological studies, drug development, and evolutionary analysis. Phylogenetic search is a known NP-Hard problem. The size of the data sets which can be analyzed is limited by the exponential growth in the number of trees that must be considered as the problem size increases. A better understanding of the problem space could lead to better methods, which in turn could lead to the feasible analysis of more data sets. We present a definition of phylogenetic tree space and a visualization of this space that shows significant exploitable structure. This structure can be used to develop search methods capable of handling much larger data sets

Crossref

Springer - Publisher Connector

PubMed Central

MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation

Author: Flouri T
Hoang DT
Minh BQ
Stamatakis A
Vinh LS
von Haeseler A
Publication venue
Publication date: 01/02/2018
Field of study

Background: The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. Results: To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2–20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3–63. 9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. Conclusions: MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboo

KITopen

Directory of Open Access Journals

UCL Discovery

Phylogenetic search through partial tree mixing.

Author: Clement Mark
Crandall Keith
Snell Quinn
Sundberg Kenneth
Ventura Dan
Whiting Michael
Publication venue: Health Sciences Research Commons
Publication date: 01/01/2012
Field of study

BACKGROUND: Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. RESULTS: When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda CONCLUSIONS: The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution

PubMed Central

George Washington University: Health Sciences Research Commons (HSRC)

A genetic algorithm based global search strategy for population pharmacokinetic/pharmacodynamic model selection

Author: Sale Mark
Sherer Eric A.
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

The current algorithm for selecting a population pharmacokinetic/pharmacodynamic model is based on the well-established forward addition/backward elimination method. A central strength of this approach is the opportunity for a modeller to continuously examine the data and postulate new hypotheses to explain observed biases. This algorithm has served the modelling community well, but the model selection process has essentially remained unchanged for the last 30 years. During this time, more robust approaches to model selection have been made feasible by new technology and dramatic increases in computation speed. We review these methods, with emphasis on genetic algorithm approaches and discuss the role these methods may play in population pharmacokinetic/pharmacodynamic model selection

IUPUIScholarWorks