90 research outputs found

    A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees

    Get PDF
    Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work and are publicly available. We also apply our methods to real data

    Computing hybridization networks using agreement forests

    Get PDF
    Rooted phylogenetic trees are widely used in biology to represent the evolutionary history of certain species. Usually, such a tree is a simple binary tree only containing internal nodes of in-degree one and out-degree two representing specific speciation events. In applied phylogenetics, however, trees can contain nodes of out-degree larger than two because, often, in order to resolve some orderings of speciation events, there is only insufficient information available and the common way to model this uncertainty is to use nonbinary nodes (i.e., nodes of out-degree of at least three), also denoted as polytomies. Moreover, in addition to such speciation events, there exist certain biological events that cannot be modeled by a tree and, thus, require the more general concept of rooted phylogenetic networks or, more specifically, of hybridization networks. Examples for such reticulate events are horizontal gene transfer, hybridization, and recombination. Nevertheless, in order to construct hybridization networks, the less general concept of a phylogenetic tree can still be used as building block. More precisely, often, in a first step, phylogenetic trees for a set of species, each based on a distinctive orthologous gene, are constructed. In a second step, specific sets containing common subtrees of those trees, known as maximum acyclic agreement forests, are calculated, which are then glued together to a single hybridization network. In such a network, hybridization nodes (i.e., nodes of in-degree larger than or equal to two) can exist representing potential reticulate events of the underlying evolutionary history. As such events are considered as rare phenomena, from a biological point of view, especially those networks representing a minimum number of reticulate events, which is denoted as hybridization number, are of high interest. Consequently, in a mathematical aspect, the problem of calculating hybridization networks can be briefly described as follows. Given a set T of rooted phylogenetic trees sharing the same set of taxa, compute a hybridization network N displaying T with minimum hybridization number. In this context, we say that such a network N displays a phylogenetic tree T, if we can obtain T from N by removing as well as contracting some of its nodes and edges. Unfortunately, this is a computational hard problem (i.e., it is NP-hard), even for the simplest case given just two binary input trees. In this thesis, we present several methods tackling this NP-hard problem. Our first approach describes how to compute a representative set of minimum hybridization networks for two binary input trees. For that purpose, our approach implements the first non-naive algorithm - called allMAAFs - calculating all maximum acyclic agreement forests for two rooted binary phylogenetic trees on the same set of taxa. In a subsequent step, in order to maximize the efficiency of the algorithm allMAAFs, we have developed additionally several modifications each reducing the number of computational steps and, thus, significantly improving its practical runtime. Our second approach is an extension of our first approach making the underlying algorithm accessible to more than two binary input trees. For this purpose, our approach implements the algorithm allHNetworks being the first algorithm calculating all relevant hybridization networks displaying a set of rooted binary phylogenetic trees on the same set of taxa, which is a preferable feature when studying hybridization events. Lastly, we have developed a generalization of our second approach that can now deal with multiple nonbinary input trees. For that purpose, our approach implements the first non-naive algorithm - called allMulMAAFs - calculating a relevant set of nonbinary maximum acyclic agreement forests for two rooted (nonbinary) phylogenetic trees on the same set of taxa. Each of the algorithms above is integrated into our user friendly Java-based software package Hybroscale, which is freely available and platform independent, so that it runs on all major operating systems. Our program provides a graphical user interface for visualizing trees and networks. Moreover, it facilitates the interpretation of computed hybridization networks by adding specific features to its graphical representation and, thus, supports biologists in investigating reticulate evolution. In addition, we have implemented a method using a user friendly SQL-style modeling language for filtering the usually large amount of reported networks

    A quadratic kernel for computing the hybridization number of multiple trees

    Full text link
    It has recently been shown that the NP-hard problem of calculating the minimum number of hybridization events that is needed to explain a set of rooted binary phylogenetic trees by means of a hybridization network is fixed-parameter tractable if an instance of the problem consists of precisely two such trees. In this paper, we show that this problem remains fixed-parameter tractable for an arbitrarily large set of rooted binary phylogenetic trees. In particular, we present a quadratic kernel

    Kernelizations for the hybridization number problem on multiple nonbinary trees

    Get PDF
    Given a finite set XX, a collection T\mathcal{T} of rooted phylogenetic trees on XX and an integer kk, the Hybridization Number problem asks if there exists a phylogenetic network on XX that displays all trees from T\mathcal{T} and has reticulation number at most kk. We show two kernelization algorithms for Hybridization Number, with kernel sizes 4k(5k)t4k(5k)^t and 20k2(Δ+−1)20k^2(\Delta^+-1) respectively, with tt the number of input trees and Δ+\Delta^+ their maximum outdegree. Experiments on simulated data demonstrate the practical relevance of these kernelization algorithms. In addition, we present an nf(k)tn^{f(k)}t-time algorithm, with n=∣X∣n=|X| and ff some computable function of kk

    A simple fixed parameter tractable algorithm for computing the hybridization number of two (not necessarily binary) trees

    Get PDF
    Here we present a new fixed parameter tractable algorithm to compute the hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set X in time (6^r.r!).poly(n)$, where n=|X|. The novelty of this approach is its use of terminals, which are maximal elements of a natural partial order on X, and several insights from the softwired clusters literature. This yields a surprisingly simple and practical bounded-search algorithm and offers an alternative perspective on the underlying combinatorial structure of the hybridization number problem
    • …
    corecore