27 research outputs found

    Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees

    Full text link
    In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species XX; these relationships are often depicted via a phylogenetic tree -- a tree having its leaves univocally labeled by elements of XX and without degree-2 nodes -- called the "species tree". One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g. DNA sequences originating from some species in XX), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The so-obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping -- but not identical -- sets of labels, is called "supertree". In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed-parameter tractable in the number of input trees kk, by using their expressibility in Monadic Second Order Logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on kk of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time 2O(k2)â‹…n2^{O(k^2)} \cdot n, where nn is the total size of the input.Comment: 18 pages, 1 figur

    Compatibility of unrooted phylogenetic trees is FPT

    Get PDF
    Abstract A collection of T 1 , T 2 , . . . , T k of unrooted, leaf labelled (phylogenetic) trees, all with different leaf sets, is said to be compatible if there exists a tree T such that each tree T i can be obtained from T by deleting leaves and contracting edges. Determining compatibility is NP-hard, and the fastest algorithm to date has worst case complexity of around (n k ) time, n being the number of leaves. Here, we present an O(nf (k)) algorithm, proving that compatibility of unrooted phylogenetic trees is fixed parameter tractable (FPT) with respect to the number k of trees

    Treewidth of display graphs: bounds, brambles and applications

    Get PDF
    Phylogenetic trees and networks are leaf-labelled graphs used to model evolution. Display graphs are created by identifying common leaf labels in two or more phylogenetic trees or networks. The treewidth of such graphs is bounded as a function of many common dissimilarity measures between phylogenetic trees and this has been leveraged in fixed parameter tractability results. Here we further elucidate the properties of display graphs and their interaction with treewidth. We show that it is NP-hard to recognize display graphs, but that display graphs of bounded treewidth can be recognized in linear time. Next we show that if a phylogenetic network displays (i.e. topologically embeds) a phylogenetic tree, the treewidth of their display graph is bounded by a function of the treewidth of the original network (and also by various other parameters). In fact, using a bramble argument we show that this treewidth bound is sharp up to an additive term of 1. We leverage this bound to give an FPT algorithm, parameterized by treewidth, for determining whether a network displays a tree, which is an intensively-studied problem in the field. We conclude with a discussion on the future use of display graphs and treewidth in phylogenetics

    Composing dynamic programming tree-decomposition-based algorithms

    Full text link
    Given two integers ℓ\ell and pp as well as ℓ\ell graph classes H1,…,Hℓ\mathcal{H}_1,\ldots,\mathcal{H}_\ell, the problems GraphPart(H1,…,Hℓ,p)\mathsf{GraphPart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell,p), VertPart(H1,…,Hℓ)\mathsf{VertPart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell), and EdgePart(H1,…,Hℓ)\mathsf{EdgePart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell) ask, given graph GG as input, whether V(G)V(G), V(G)V(G), E(G)E(G) respectively can be partitioned into ℓ\ell sets S1,…,SℓS_1, \ldots, S_\ell such that, for each ii between 11 and ℓ\ell, G[Vi]∈HiG[V_i] \in \mathcal{H}_i, G[Vi]∈HiG[V_i] \in \mathcal{H}_i, (V(G),Si)∈Hi(V(G),S_i) \in \mathcal{H}_i respectively. Moreover in GraphPart(H1,…,Hℓ,p)\mathsf{GraphPart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell,p), we request that the number of edges with endpoints in different sets of the partition is bounded by pp. We show that if there exist dynamic programming tree-decomposition-based algorithms for recognizing the graph classes Hi\mathcal{H}_i, for each ii, then we can constructively create a dynamic programming tree-decomposition-based algorithms for GraphPart(H1,…,Hℓ,p)\mathsf{GraphPart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell,p), VertPart(H1,…,Hℓ)\mathsf{VertPart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell), and EdgePart(H1,…,Hℓ)\mathsf{EdgePart}(\mathcal{H}_1, \ldots, \mathcal{H}_\ell). We show that, in some known cases, the obtained running times are comparable to those of the best know algorithms

    Embedding Phylogenetic Trees in Networks of Low Treewidth

    Get PDF
    Given a rooted, binary phylogenetic network and a rooted, binary phylogenetic tree, can the tree be embedded into the network? This problem, called Tree Containment, arises when validating networks constructed by phylogenetic inference methods. We present the first algorithm for (rooted) Tree Containment using the treewidth t of the input network N as parameter, showing that the problem can be solved in 2O(t2) |N| time and space.Optimizatio

    Treewidth distance on phylogenetic trees

    Get PDF
    In this article we study the treewidth of the display graph, an auxiliary graph structure obtained from the fusion of phylogenetic (i.e., evolutionary) trees at their leaves. Earlier work has shown that the treewidth of the display graph is bounded if the trees are in some formal sense topologically similar. Here we further expand upon this relationship. We analyse a number of reduction rules, commonly used in the phylogenetics literature to obtain fixed parameter tractable algorithms. In some cases (the subtree reduction) the reduction rules behave similarly with respect to treewidth, while others (the cluster reduction) behave very differently, and the behaviour of the chain reduction is particularly intriguing because of its link with graph separators and forbidden minors. We also show that the gap between treewidth and Tree Bisection and Reconnect (TBR) distance can be infinitely large, and that unlike, for example, planar graphs the treewidth of the display graph can be as much as linear in its number of vertices. A number of other auxiliary results are given. We conclude with a discussion and list a number of open problems

    Maximum agreement and compatible supertrees

    Get PDF
    AbstractGiven a set of leaf-labelled trees with identical leaf sets, the MAST problem, respectively MCT problem, consists of finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, respectively compatible. In this paper, we propose extensions of these problems to the context of supertree inference, where input trees have non-identical leaf sets. This situation is of particular interest in phylogenetics. The resulting problems are called SMAST and SMCT.A sufficient condition is given that identifies cases where these problems can be solved by resorting to MAST and MCT as subproblems. This condition is met, for instance, when only two input trees are considered. Then we give algorithms for SMAST and SMCT that benefit from the link with the subtree problems. These algorithms run in time linear to the time needed to solve MAST, respectively MCT, on an instance of the same or smaller size.It is shown that arbitrary instances of SMAST and SMCT can be turned in polynomial time into instances composed of trees with a bounded number of leaves.SMAST is shown to be W[2]-hard when the considered parameter is the number of input leaves that have to be removed to obtain the agreement of the input trees. A similar result holds for SMCT. Moreover, the corresponding optimization problems, that is the complements of SMAST and SMCT, cannot be approximated in polynomial time within any constant factor, unless P=NP. These results also hold when the input trees have a bounded number of leaves.The presented results apply to both collections of rooted and unrooted trees

    Contributions to computational phylogenetics and algorithmic self-assembly

    Get PDF
    This dissertation addresses some of the algorithmic and combinatorial problems at the interface between biology and computation. In particular, it focuses on problems in both computational phylogenetics, an area of study in which computation is used to better understand evolutionary relationships, and algorithmic self-assembly, an area of study in which biological processes are used to perform computation. The first set of results investigate inferring phylogenetic trees from multi-state character data. We give a novel characterization of when a set of three-state characters has a perfect phylogeny and make progress on a long-standing conjecture regarding the compatibility of multi-state characters. The next set of results investigate inferring phylogenetic supertrees from collections of smaller input trees when the input trees do not fully agree on the relative positions of the taxa. Two approaches to dealing with such conflicting input trees are considered. The first is to contract a set of edges in the input trees so that the resulting trees have an agreement supertree. The second is to remove a set of taxa from the input trees so that the resulting trees have an agreement supertree. We give fixed-parameter tractable algorithms for both approaches. We then turn to the algorithmic self-assembly of fractal structures from DNA tiles and investigate approximating the Sierpinski triangle and the Sierpinski carpet with strict self-assembly. We prove tight bounds on approximating the Sierpinski triangle and exhibit a class of fractals that are generalizations of the Sierpinski carpet that can approximately self-assemble. We conclude by discussing some ideas for further research

    Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees

    Get PDF
    One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a "supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS
    corecore