42 research outputs found

    Fixed Parameter Polynomial Time Algorithms for Maximum Agreement and Compatible Supertrees

    Get PDF
    Consider a set of labels LL and a set of trees {\mathcal T} = \{{\mathcal T}^{(1), {\mathcal T}^{(2), ..., {\mathcal T}^{(k) \$ where each tree {\mathcal T}^{(i)isdistinctlyleaflabeledbysomesubsetof is distinctly leaf-labeled by some subset of L.Onefundamentalproblemistofindthebiggesttree(denotedassupertree)torepresent. One fundamental problem is to find the biggest tree (denoted as supertree) to represent \mathcal T}whichminimizesthedisagreementswiththetreesin which minimizes the disagreements with the trees in {\mathcal T}undercertaincriteria.Thisproblemfindsapplicationsinphylogenetics,database,anddatamining.Inthispaper,wefocusontwoparticularsupertreeproblems,namely,themaximumagreementsupertreeproblem(MASP)andthemaximumcompatiblesupertreeproblem(MCSP).ThesetwoproblemsareknowntobeNPhardfor under certain criteria. This problem finds applications in phylogenetics, database, and data mining. In this paper, we focus on two particular supertree problems, namely, the maximum agreement supertree problem (MASP) and the maximum compatible supertree problem (MCSP). These two problems are known to be NP-hard for k \geq 3.ThispapergivesthefirstpolynomialtimealgorithmsforbothMASPandMCSPwhenboth. This paper gives the first polynomial time algorithms for both MASP and MCSP when both kandthemaximumdegree and the maximum degree D$ of the trees are constant

    Fixed parameter algorithms for compatible and agreement supertree problems

    Get PDF
    Biologists represent evolutionary history of species through phylogenetic trees. Leaves of a phylogenetic tree represent the species and internal vertices represent the extinct ancestors. Given a collection of input phylogenetic trees, a common problem in computational biology is to build a supertree that captures the evolutionary history of all the species in the input trees, and is consistent with each of the input trees. In this document we study the tree compatibility and agreement supertree problems. Tree compatibility problem is NP-complete but has been shown to be fixed parameter tractable when parametrized by number of input trees. We characterize the compatible supertree problem in terms of triangulation of a structure called the display graph. We also give an alternative characterization in terms of cuts of the display graph. We show how these characterizations are related to characterization given in terms of triangulation of the edge label intersection graph. We then give a characterization of the agreement supertree problem. In real world data, consistent supertrees do not always exist. Inconsistencies can be dealt with by contraction of edges or removal of taxa. The agreement supertree edge contraction (AST-EC) problem asks if a collection of k rooted trees can be made to agree by contraction of at most p edges. Similarly, the agreement supertree taxon removal (AST-TR) problem asks if a collection of k rooted trees can be made to agree by removal of at most p taxa. We give fixed parameter algorithms for both cases when parametrized by k and p. We study the long standing conjecture on the perfect phylogeny problem; there exists a function f (r) such that a given collection C of r-state characters is compatible if and only if every f (r) subset of C is compatible. We will show that for r ≥ 2, f (r) ≥ lceil (r/2) rceil * lfloor(r/2)rfloor + 1

    Contributions to computational phylogenetics and algorithmic self-assembly

    Get PDF
    This dissertation addresses some of the algorithmic and combinatorial problems at the interface between biology and computation. In particular, it focuses on problems in both computational phylogenetics, an area of study in which computation is used to better understand evolutionary relationships, and algorithmic self-assembly, an area of study in which biological processes are used to perform computation. The first set of results investigate inferring phylogenetic trees from multi-state character data. We give a novel characterization of when a set of three-state characters has a perfect phylogeny and make progress on a long-standing conjecture regarding the compatibility of multi-state characters. The next set of results investigate inferring phylogenetic supertrees from collections of smaller input trees when the input trees do not fully agree on the relative positions of the taxa. Two approaches to dealing with such conflicting input trees are considered. The first is to contract a set of edges in the input trees so that the resulting trees have an agreement supertree. The second is to remove a set of taxa from the input trees so that the resulting trees have an agreement supertree. We give fixed-parameter tractable algorithms for both approaches. We then turn to the algorithmic self-assembly of fractal structures from DNA tiles and investigate approximating the Sierpinski triangle and the Sierpinski carpet with strict self-assembly. We prove tight bounds on approximating the Sierpinski triangle and exhibit a class of fractals that are generalizations of the Sierpinski carpet that can approximately self-assemble. We conclude by discussing some ideas for further research

    A list of parameterized problems in bioinformatics

    Get PDF
    In this report we present a list of problems that originated in bionformatics. Our aim is to collect information on such problems that have been analyzed from the point of view of Parameterized Complexity. For every problem we give its definition and biological motivation together with known complexity results.Postprint (published version

    Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees

    Get PDF
    One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a "supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS

    Computing Maximum Agreement Forests without Cluster Partitioning is Folly

    Get PDF
    Computing a maximum (acyclic) agreement forest (M(A)AF) of a pair of phylogenetic trees is known to be fixed-parameter tractable; the two main techniques are kernelization and depth-bounded search. In theory, kernelization-based algorithms for this problem are not competitive, but they perform remarkably well in practice. We shed light on why this is the case. Our results show that, probably unsurprisingly, the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms. The key to performance is cluster partitioning, a technique used in almost all fast M(A)AF algorithms. In theory, cluster partitioning does not help: some instances are highly clusterable, others not at all. However, our experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms. In contrast, kernelizing the individual clusters before solving them using exponential search yields only very modest performance improvements or even hurts performance; for the vast majority of inputs, kernelization leads to no reduction in the maximal cluster size at all. The choice of the algorithm applied to solve individual clusters also significantly impacts performance, even though our limited experiment to evaluate this produced no clear winner; depth-bounded search, exponential search interleaved with kernelization, and an ILP-based algorithm all achieved competitive performance

    Large-scale Tree Parsimony

    Get PDF
    Finding the tree of life is one of the major challenges that scientists are attempting to solve. It is widely believed that the evolution of species can (mostly) be depicted in a tree graph, the phylogenetic tree. However, the true phylogenetic species tree is often unknown. One approach is to computationally infer phylogenetic trees from phylogenetic information encoded in genomic data. With the advancement of sequencing techniques, we have a rapidly growing availability of phylogenetic data, which enable the construction of large-scale phylogenetic trees. This thesis addresses algorithmic issues for the construction of large-scale phylogenetic species trees, the supertrees, and the exploration and analysis of large-scale phylogenetic trees. We present (i) new algorithms for local search methods for supertree construction that reduce the time complexity by an order of magnitude and a parallelization for these methods, (ii) new methods for constructing better supertrees from estimated trees and inferring small, exact phylogenetic trees, and (iii) a novel, interactive visual method for the large-scale tree exploration and the concurrent analysis of multiple gene trees and one species tree
    corecore