149 research outputs found

    Average-case analysis of perfect sorting by reversals (Journal Version)

    Full text link
    Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. B\'erard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial properties of this family of trees are essential to the algorithm analysis. Here, we use the expected value of certain tree parameters to prove that the average run-time of the algorithm is at worst, polynomial, and additionally, for sufficiently long permutations, the sorting algorithm runs in polynomial time with probability one. Furthermore, our analysis of the subclass of commuting scenarios yields precise results on the average length of a reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete Mathematics, Algorithms and Applications, vol. 3(3), 201

    Phylogenetic information complexity: Is testing a tree easier than finding it?

    Get PDF
    Phylogenetic trees describe the evolutionary history of a group of present-day species from a common ancestor. These trees are typically reconstructed from aligned DNA sequence data. In this paper we analytically address the following question: is the amount of sequence data required to accurately reconstruct a tree significantly more than the amount required to test whether or not a candidate tree was the `true' tree? By `significantly', we mean that the two quantities behave the same way as a function of the number of species being considered. We prove that, for a certain type of model, the amount of information required is not significantly different; while for another type of model, the information required to test a tree is independent of the number of leaves, while that required to reconstruct it grows with this number. Our results combine probabilistic and combinatorial arguments.Comment: 15 pages, 3 figure

    A simple fixed parameter tractable algorithm for computing the hybridization number of two (not necessarily binary) trees

    Get PDF
    Here we present a new fixed parameter tractable algorithm to compute the hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set X in time (6^r.r!).poly(n)$, where n=|X|. The novelty of this approach is its use of terminals, which are maximal elements of a natural partial order on X, and several insights from the softwired clusters literature. This yields a surprisingly simple and practical bounded-search algorithm and offers an alternative perspective on the underlying combinatorial structure of the hybridization number problem

    The identifiability of tree topology for phylogenetic models, including covarion and mixture models

    Full text link
    For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.Comment: 20 pages, 1 figur

    Phylogenetic mixtures on a single tree can mimic a tree of another topology

    Full text link
    Phylogenetic mixtures model the inhomogeneous molecular evolution commonly observed in data. The performance of phylogenetic reconstruction methods where the underlying data is generated by a mixture model has stimulated considerable recent debate. Much of the controversy stems from simulations of mixture model data on a given tree topology for which reconstruction algorithms output a tree of a different topology; these findings were held up to show the shortcomings of particular tree reconstruction methods. In so doing, the underlying assumption was that mixture model data on one topology can be distinguished from data evolved on an unmixed tree of another topology given enough data and the ``correct'' method. Here we show that this assumption can be false. For biologists our results imply that, for example, the combined data from two genes whose phylogenetic trees differ only in terms of branch lengths can perfectly fit a tree of a different topology

    Analysis of top-swap shuffling for genome rearrangements

    Full text link
    We study Markov chains which model genome rearrangements. These models are useful for studying the equilibrium distribution of chromosomal lengths, and are used in methods for estimating genomic distances. The primary Markov chain studied in this paper is the top-swap Markov chain. The top-swap chain is a card-shuffling process with nn cards divided over kk decks, where the cards are ordered within each deck. A transition consists of choosing a random pair of cards, and if the cards lie in different decks, we cut each deck at the chosen card and exchange the tops of the two decks. We prove precise bounds on the relaxation time (inverse spectral gap) of the top-swap chain. In particular, we prove the relaxation time is Θ(n+k)\Theta(n+k). This resolves an open question of Durrett.Comment: Published in at http://dx.doi.org/10.1214/105051607000000177 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …