619 research outputs found
Maximum likelihood estimates of pairwise rearrangement distances
Accurate estimation of evolutionary distances between taxa is important for
many phylogenetic reconstruction methods. In the case of bacteria, distances
can be estimated using a range of different evolutionary models, from single
nucleotide polymorphisms to large-scale genome rearrangements. In the case of
sequence evolution models (such as the Jukes-Cantor model and associated
metric) have been used to correct pairwise distances. Similar correction
methods for genome rearrangement processes are required to improve inference.
Current attempts at correction fall into 3 categories: Empirical computational
studies, Bayesian/MCMC approaches, and combinatorial approaches. Here we
introduce a maximum likelihood estimator for the inversion distance between a
pair of genomes, using the group-theoretic approach to modelling inversions
introduced recently. This MLE functions as a corrected distance: in particular,
we show that because of the way sequences of inversions interact with each
other, it is quite possible for minimal distance and MLE distance to
differently order the distances of two genomes from a third. This has obvious
implications for the use of minimal distance in phylogeny reconstruction. The
work also tackles the above problem allowing free rotation of the genome.
Generally a frame of reference is locked, and all computation made accordingly.
This work incorporates the action of the dihedral group so that distance
estimates are free from any a priori frame of reference.Comment: 21 pages, 7 figures. To appear in the Journal of Theoretical Biolog
A Computational Method for the Rate Estimation of Evolutionary Transpositions
Genome rearrangements are evolutionary events that shuffle genomic
architectures. Most frequent genome rearrangements are reversals,
translocations, fusions, and fissions. While there are some more complex genome
rearrangements such as transpositions, they are rarely observed and believed to
constitute only a small fraction of genome rearrangements happening in the
course of evolution. The analysis of transpositions is further obfuscated by
intractability of the underlying computational problems.
We propose a computational method for estimating the rate of transpositions
in evolutionary scenarios between genomes. We applied our method to a set of
mammalian genomes and estimated the transpositions rate in mammalian evolution
to be around 0.26.Comment: Proceedings of the 3rd International Work-Conference on
Bioinformatics and Biomedical Engineering (IWBBIO), 2015. (to appear
Average-case analysis of perfect sorting by reversals (Journal Version)
Perfect sorting by reversals, a problem originating in computational
genomics, is the process of sorting a signed permutation to either the identity
or to the reversed identity permutation, by a sequence of reversals that do not
break any common interval. B\'erard et al. (2007) make use of strong interval
trees to describe an algorithm for sorting signed permutations by reversals.
Combinatorial properties of this family of trees are essential to the algorithm
analysis. Here, we use the expected value of certain tree parameters to prove
that the average run-time of the algorithm is at worst, polynomial, and
additionally, for sufficiently long permutations, the sorting algorithm runs in
polynomial time with probability one. Furthermore, our analysis of the subclass
of commuting scenarios yields precise results on the average length of a
reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of
Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete
Mathematics, Algorithms and Applications, vol. 3(3), 201
On the effective and automatic enumeration of polynomial permutation classes
We describe an algorithm, implemented in Python, which can enumerate any
permutation class with polynomial enumeration from a structural description of
the class. In particular, this allows us to find formulas for the number of
permutations of length n which can be obtained by a finite number of block
sorting operations (e.g., reversals, block transpositions, cut-and-paste
moves)
Limited Lifespan of Fragile Regions in Mammalian Evolution
An important question in genome evolution is whether there exist fragile
regions (rearrangement hotspots) where chromosomal rearrangements are happening
over and over again. Although nearly all recent studies supported the existence
of fragile regions in mammalian genomes, the most comprehensive phylogenomic
study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some
doubts about their existence. We demonstrate that fragile regions are subject
to a "birth and death" process, implying that fragility has limited
evolutionary lifespan. This finding implies that fragile regions migrate to
different locations in different mammals, explaining why there exist only a few
chromosomal breakpoints shared between different lineages. The birth and death
of fragile regions phenomenon reinforces the hypothesis that rearrangements are
promoted by matching segmental duplications and suggests putative locations of
the currently active fragile regions in the human genome
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
- …