149 research outputs found
Average-case analysis of perfect sorting by reversals (Journal Version)
Perfect sorting by reversals, a problem originating in computational
genomics, is the process of sorting a signed permutation to either the identity
or to the reversed identity permutation, by a sequence of reversals that do not
break any common interval. B\'erard et al. (2007) make use of strong interval
trees to describe an algorithm for sorting signed permutations by reversals.
Combinatorial properties of this family of trees are essential to the algorithm
analysis. Here, we use the expected value of certain tree parameters to prove
that the average run-time of the algorithm is at worst, polynomial, and
additionally, for sufficiently long permutations, the sorting algorithm runs in
polynomial time with probability one. Furthermore, our analysis of the subclass
of commuting scenarios yields precise results on the average length of a
reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of
Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete
Mathematics, Algorithms and Applications, vol. 3(3), 201
Phylogenetic information complexity: Is testing a tree easier than finding it?
Phylogenetic trees describe the evolutionary history of a group of
present-day species from a common ancestor. These trees are typically
reconstructed from aligned DNA sequence data. In this paper we analytically
address the following question: is the amount of sequence data required to
accurately reconstruct a tree significantly more than the amount required to
test whether or not a candidate tree was the `true' tree? By `significantly',
we mean that the two quantities behave the same way as a function of the number
of species being considered. We prove that, for a certain type of model, the
amount of information required is not significantly different; while for
another type of model, the information required to test a tree is independent
of the number of leaves, while that required to reconstruct it grows with this
number. Our results combine probabilistic and combinatorial arguments.Comment: 15 pages, 3 figure
A simple fixed parameter tractable algorithm for computing the hybridization number of two (not necessarily binary) trees
Here we present a new fixed parameter tractable algorithm to compute the
hybridization number r of two rooted, not necessarily binary phylogenetic trees
on taxon set X in time (6^r.r!).poly(n)$, where n=|X|. The novelty of this
approach is its use of terminals, which are maximal elements of a natural
partial order on X, and several insights from the softwired clusters
literature. This yields a surprisingly simple and practical bounded-search
algorithm and offers an alternative perspective on the underlying combinatorial
structure of the hybridization number problem
The identifiability of tree topology for phylogenetic models, including covarion and mixture models
For a model of molecular evolution to be useful for phylogenetic inference,
the topology of evolutionary trees must be identifiable. That is, from a joint
distribution the model predicts, it must be possible to recover the tree
parameter. We establish tree identifiability for a number of phylogenetic
models, including a covarion model and a variety of mixture models with a
limited number of classes. The proof is based on the introduction of a more
general model, allowing more states at internal nodes of the tree than at
leaves, and the study of the algebraic variety formed by the joint
distributions to which it gives rise. Tree identifiability is first established
for this general model through the use of certain phylogenetic invariants.Comment: 20 pages, 1 figur
Phylogenetic mixtures on a single tree can mimic a tree of another topology
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly
observed in data. The performance of phylogenetic reconstruction methods where
the underlying data is generated by a mixture model has stimulated considerable
recent debate. Much of the controversy stems from simulations of mixture model
data on a given tree topology for which reconstruction algorithms output a tree
of a different topology; these findings were held up to show the shortcomings
of particular tree reconstruction methods. In so doing, the underlying
assumption was that mixture model data on one topology can be distinguished
from data evolved on an unmixed tree of another topology given enough data and
the ``correct'' method. Here we show that this assumption can be false. For
biologists our results imply that, for example, the combined data from two
genes whose phylogenetic trees differ only in terms of branch lengths can
perfectly fit a tree of a different topology
Analysis of top-swap shuffling for genome rearrangements
We study Markov chains which model genome rearrangements. These models are
useful for studying the equilibrium distribution of chromosomal lengths, and
are used in methods for estimating genomic distances. The primary Markov chain
studied in this paper is the top-swap Markov chain. The top-swap chain is a
card-shuffling process with cards divided over decks, where the cards
are ordered within each deck. A transition consists of choosing a random pair
of cards, and if the cards lie in different decks, we cut each deck at the
chosen card and exchange the tops of the two decks. We prove precise bounds on
the relaxation time (inverse spectral gap) of the top-swap chain. In
particular, we prove the relaxation time is . This resolves an
open question of Durrett.Comment: Published in at http://dx.doi.org/10.1214/105051607000000177 the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
- …