26 research outputs found

    Lower bounding edit distances between permutations

    Get PDF
    International audienceA number of fields, including the study of genome rearrangements and the design of interconnection networks, deal with the connected problems of sorting permutations in "as few moves as possible", using a given set of allowed operations, or computing the number of moves the sorting process requires, often referred to as the distance of the permutation. These operations often act on just one or two segments of the permutation, e.g. by reversing one segment or exchanging two segments. The cycle graph of the permutation to sort is a fundamental tool in the theory of genome rearrangements, and has proved useful in settling the complexity of many variants of the above problems. In this paper, we present an algebraic reinterpretation of the cycle graph of a permutation π as an even permutation π, and show how to reformulate our sorting problems in terms of particular factorisations of the latter permutation. Using our framework, we recover known results in a simple and unified way, and obtain a new lower bound on the prefix transposition distance (where a prefix transposition displaces the initial segment of a permutation), which is shown to outperform previous results. Moreover, we use our approach to improve the best known lower bound on the prefix transposition diameter from 2n/3 to ⌊3n/4⌋, and investigate a few relations between some statistics on π and π

    Breaking Good: Accounting For Fragility Of Genomic Regions In Rearrangement Distance Estimation

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly Small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.8514271439FAPESP [2013/25084-2]French Agence Nationale de la Recherche (ANR) [ANR-10-BINF-01-01]ICT FP7 european programme EVOEVOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

    Models and Algorithms for Whole-Genome Evolution and their Use in Phylogenetic Inference

    Get PDF
    The rapid accumulation of sequenced genomes offers the chance to resolve longstanding questions about the evolutionary histories, or phylogenies, of groups of organisms. The relatively rare occurrence of large-scale evolutionary events in a whole genome, events such as genome rearrangements, duplications and losses, enables us to extract a strong and robust phylogenetic signal from whole-genome data. The work presented in this dissertation focuses on models and algorithms for whole-genome evolution and their use in phylogenetic inference. We designed algorithms to estimate pairwise genomic distances from large-scale genomic changes. We refined the evolutionary models on whole-genome evolution. We also made use of these results to provide fast and accurate methods for phylogenetic inference, that scales up, in both speed and accuracy, to modern high-resolution whole-genome data. We designed algorithms to estimate the true evolutionary distance between two genomes under genome rearrangements, and also under rearrangements, plus gains and losses. We refined the evolutionary model to be the first mathematical model to preserve the structural dichotomy in genomic organization between most prokaryotes and most eukaryotes. Those models and associated distance estimators provide a basis for studying facets of possible mechanisms of evolution through simulation and application to real genomes. Phylogenetic analyses from whole-genome data have been limited to small collections of genomes and low-resolution data; they have also lacked an effective assessment of robustness. We developed an approach that combines our distance estimator, any standard distance-based reconstruction algorithm, and a novel bootstrapping method based on resampling genomic adjacencies. The resulting tool overcomes a serious and long-standing impediment to the use of whole-genome data in phylogenetic inference and provides results comparable in accuracy and robustness to distance-based methods for sequence data. Maximum-likelihood approaches have been successfully applied to phylogenetic inferences for aligned sequences, but such applications remain primitive for whole-genome data. We developed a maximum-likelihood approach to phylogenetic analysis from whole-genome data. In combination with our bootstrap scheme, this new approach yields the first reliable phylogenetic tool for the analysis of whole-genome data at the level of syntenic blocks

    Breaking Good: Accounting for Fragility of Genomic Regions in Rearrangement Distance Estimation

    Get PDF
    International audienceModels of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility tobreakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing theirprecise localization,we call “solid” the regions that are improbably broken by rearrangements and “fragile” the regions outside solidones.We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It containsas a particular case the uniform breakage model on the nucleotidic sequence,where breakage probabilities are proportional to fragileregion lengths. This is very different from the frequently used pseudo uniform model where all fragile regions have the same probabilityto break. Estimations of rearrangement distances based on the pseudo uniform model completely fail on simulations with thetruly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherentdistance estimations, especially with the pseudo uniform model, and to a lesser extent with the truly uniform model. This incoherenceis solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragileregions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairsof genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell

    Asymptotic behavior of some statistics in Ewens random permutations

    Get PDF
    The purpose of this article is to present a general method to find limiting laws for some renormalized statistics on random permutations. The model considered here is Ewens sampling model, which generalizes uniform random permutations. We describe the asymptotic behavior of a large family of statistics, including the number of occurrences of any given dashed pattern. Our approach is based on the method of moments and relies on the following intuition: two events involving the images of different integers are almost independent.Comment: 32 pages: final version for EJP, produced by the author. An extended abstract of 12 pages, published in the proceedings of AofA 2012, is also available as version
    corecore