Search CORE

149 research outputs found

Average-case analysis of perfect sorting by reversals (Journal Version)

Author: Bouvel Mathilde
Chauve Cedric
Mishna Marni
Rossin Dominique
Publication venue
Publication date: 01/01/2011
Field of study

Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. B\'erard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial properties of this family of trees are essential to the algorithm analysis. Here, we use the expected value of certain tree parameters to prove that the average run-time of the algorithm is at worst, polynomial, and additionally, for sufficiently long permutations, the sorting algorithm runs in polynomial time with probability one. Furthermore, our analysis of the subclass of commuting scenarios yields precise results on the average length of a reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete Mathematics, Algorithms and Applications, vol. 3(3), 201

arXiv.org e-Print Archive

Crossref

Hal-Diderot

HAL-Polytechnique

Phylogenetic information complexity: Is testing a tree easier than finding it?

Author: Mossel Elchanan
Steel Mike
Szekely Laszlo
Publication venue
Publication date: 01/01/2008
Field of study

Phylogenetic trees describe the evolutionary history of a group of present-day species from a common ancestor. These trees are typically reconstructed from aligned DNA sequence data. In this paper we analytically address the following question: is the amount of sequence data required to accurately reconstruct a tree significantly more than the amount required to test whether or not a candidate tree was the `true' tree? By `significantly', we mean that the two quantities behave the same way as a function of the number of species being considered. We prove that, for a certain type of model, the amount of information required is not significantly different; while for another type of model, the information required to test a tree is independent of the number of leaves, while that required to reconstruct it grows with this number. Our results combine probabilistic and combinatorial arguments.Comment: 15 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

A simple fixed parameter tractable algorithm for computing the hybridization number of two (not necessarily binary) trees

Author: Kelk Steven
Piovesan Teresa
Publication venue
Publication date: 01/01/2012
Field of study

Here we present a new fixed parameter tractable algorithm to compute the hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set X in time (6^r.r!).poly(n)$, where n=|X|. The novelty of this approach is its use of terminals, which are maximal elements of a natural partial order on X, and several insights from the softwired clusters literature. This yields a surprisingly simple and practical bounded-search algorithm and offers an alternative perspective on the underlying combinatorial structure of the hybridization number problem

arXiv.org e-Print Archive

Maastricht University Research Portal

CiteSeerX

Crossref

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

The identifiability of tree topology for phylogenetic models, including covarion and mixture models

Author: Allman Elizabeth S.
Rhodes John A.
Publication venue
Publication date: 01/01/2005
Field of study

For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.Comment: 20 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Phylogenetic mixtures on a single tree can mimic a tree of another topology

Author: Matsen Frederick A.
Steel Mike
Publication venue
Publication date: 01/01/2007
Field of study

Phylogenetic mixtures model the inhomogeneous molecular evolution commonly observed in data. The performance of phylogenetic reconstruction methods where the underlying data is generated by a mixture model has stimulated considerable recent debate. Much of the controversy stems from simulations of mixture model data on a given tree topology for which reconstruction algorithms output a tree of a different topology; these findings were held up to show the shortcomings of particular tree reconstruction methods. In so doing, the underlying assumption was that mixture model data on one topology can be distinguished from data evolved on an unmixed tree of another topology given enough data and the ``correct'' method. Here we show that this assumption can be false. For biologists our results imply that, for example, the combined data from two genes whose phylogenetic trees differ only in terms of branch lengths can perfectly fit a tree of a different topology

arXiv.org e-Print Archive

CiteSeerX

Analysis of top-swap shuffling for genome rearrangements

Author: Bhatnagar Nayantara
Caputo Pietro
Tetali Prasad
Vigoda Eric
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We study Markov chains which model genome rearrangements. These models are useful for studying the equilibrium distribution of chromosomal lengths, and are used in methods for estimating genomic distances. The primary Markov chain studied in this paper is the top-swap Markov chain. The top-swap chain is a card-shuffling process with

n

cards divided over

k

decks, where the cards are ordered within each deck. A transition consists of choosing a random pair of cards, and if the cards lie in different decks, we cut each deck at the chosen card and exchange the tops of the two decks. We prove precise bounds on the relaxation time (inverse spectral gap) of the top-swap chain. In particular, we prove the relaxation time is

\Theta(n+k)

. This resolves an open question of Durrett.Comment: Published in at http://dx.doi.org/10.1214/105051607000000177 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX