15 research outputs found

    Maximum likelihood estimates of pairwise rearrangement distances

    Get PDF
    Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. In the case of bacteria, distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. In the case of sequence evolution models (such as the Jukes-Cantor model and associated metric) have been used to correct pairwise distances. Similar correction methods for genome rearrangement processes are required to improve inference. Current attempts at correction fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using the group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. This has obvious implications for the use of minimal distance in phylogeny reconstruction. The work also tackles the above problem allowing free rotation of the genome. Generally a frame of reference is locked, and all computation made accordingly. This work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference.Comment: 21 pages, 7 figures. To appear in the Journal of Theoretical Biolog

    The expected number of inversions after n adjacent transpositions

    Get PDF
    We give a new expression for the expected number of inversions in the product of n random adjacent transpositions in the symmetric group S_{m+1}. We then derive from this expression the asymptotic behaviour of this number when n scales with m in various ways. Our starting point is an equivalence, due to Eriksson et al., with a problem of weighted walks confined to a triangular area of the plane

    Random induced subgraphs of Cayley graphs induced by transpositions

    Get PDF
    In this paper we study random induced subgraphs of Cayley graphs of the symmetric group induced by an arbitrary minimal generating set of transpositions. A random induced subgraph of this Cayley graph is obtained by selecting permutations with independent probability, λn\lambda_n. Our main result is that for any minimal generating set of transpositions, for probabilities λn=1+ϵnn1\lambda_n=\frac{1+\epsilon_n}{n-1} where n1/3+δϵn0n^{-{1/3}+\delta}\le \epsilon_n0, a random induced subgraph has a.s. a unique largest component of size (ϵn)1+ϵnn1n!\wp(\epsilon_n)\frac{1+\epsilon_n}{n-1}n!, where (ϵn)\wp(\epsilon_n) is the survival probability of a specific branching process.Comment: 18 pages, 1 figur

    Estimating true evolutionary distances under rearrangements, duplications, and losses

    Get PDF
    Background: The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes). Results: To handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator. Conclusions: We give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude

    Estimating true evolutionary distances under the DCJ model

    Get PDF
    Motivation: Modern techniques can yield the ordering and strandedness of genes on each chromosome of a genome; such data already exists for hundreds of organisms. The evolutionary mechanisms through which the set of the genes of an organism is altered and reordered are of great interest to systematists, evolutionary biologists, comparative genomicists and biomedical researchers. Perhaps the most basic concept in this area is that of evolutionary distance between two genomes: under a given model of genomic evolution, how many events most likely took place to account for the difference between the two genomes

    The expected number of inversions after n adjacent transpositions

    Get PDF
    We give a new expression for the expected number of inversions in the product of n random adjacent transpositions in the symmetric group S_{m+1}. We then derive from this expression the asymptotic behaviour of this number when n scales with m in various ways. Our starting point is an equivalence, due to Eriksson et al., with a problem of weighted walks confined to a triangular area of the plane

    Steps toward accurate reconstructions of phylogenies from gene-order data

    Get PDF
    We report on our progress in reconstructing phylogenies from gene-order data. We have developed polynomial-time methods for estimating genomic distances that greatly improve the accuracy of trees obtained using the popular neighbor-joining method; we have also further improved the running time of our GRAPPA software suite through a combination of tighter bounding and better use of the bounds. We present new experimental results (that extend those we presented at ISMB’01 and WABI’01) that demonstrate the accuracy and robustness of our distance estimators under a wide range of model conditions. Moreover, using the best of our distance estimators (EDE) in our GRAPPA software suite, along with more sophisticated bounding techniques, produced spectacular improvements in the already huge speedup: whereas our earlier experiments showed a one-million-fold speedup (when run on a 512-processor cluster), our latest experiments demonstrate a speedup of one hundred million. The combination of these various advances enabled us to conduct new phylogenetic analyses of a subset of the Campanulaceae family, confirming various conjectures about the relationships among members of the subset and confirming that inversion can be viewed as the principal mechanism of evolution for their chloroplast genome. We give representative results of the extensive experimentation we conducted on both real and simulated datasets in order to validate and characterize our approaches

    Algorithmic approaches for genome rearrangement: a review

    Full text link