8 research outputs found

    Listing all sorting reversals in quadratic time

    Get PDF
    We describe an average-case O(n2) algorithm to list all reversals on a signed permutation π that, when applied to π, produce a permutation that is closer to the identity. This algorithm is optimal in the sense that, the time it takes to write the list is Ω(n2) in the worst case

    On the distribution of the number of cycles in the breakpoint graph of a random signed permutation

    Get PDF
    International audienceWe use the finite Markov chain embedding technique to obtain the distribution of the number of cycles in the breakpoint graph of a random uniform signed permutation. This further gives a very good approximation of the distribution of the reversal distance between two random genomes

    A framework for orthology assignment from gene rearrangement data

    Get PDF
    Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.

    Colored de Bruijn Graphs and the Genome Halving Problem

    Get PDF
    Breakpoint graph analysis is a key algorithmic technique in studies of genome rearrangements. However, breakpoint graphs are defined only for genomes without duplicated genes, thus limiting their applications in rearrangement analysis. We discuss a connection between the breakpoint graphs and de Bruijn graphs that leads to a generalization of the notion of breakpoint graph for genomes with duplicated genes. We further use the generalized breakpoint graphs to study the Genome Halving Problem (first introduced and solved by Nadia El-Mabrouk and David Sankoff). The El-Mabrouk-Sankoff algorithm is rather complex, and, in this paper, we present an alternative approach that is based on generalized breakpoint graphs. The generalized breakpoint graphs make the El-Mabrouk-Sankoff result more transparent and promise to be useful in future studies of genome rearrangements

    The distribution of cycles in breakpoint graphs of signed permutations

    Get PDF
    Breakpoint graphs are ubiquitous structures in the field of genome rearrangements. Their cycle decomposition has proved useful in computing and bounding many measures of (dis)similarity between genomes, and studying the distribution of those cycles is therefore critical to gaining insight on the distributions of the genomic distances that rely on it. We extend here the work initiated by Doignon and Labarre, who enumerated unsigned permutations whose breakpoint graph contains kk cycles, to signed permutations, and prove explicit formulas for computing the expected value and the variance of the corresponding distributions, both in the unsigned case and in the signed case. We also compare these distributions to those of several well-studied distances, emphasising the cases where approximations obtained in this way stand out. Finally, we show how our results can be used to derive simpler proofs of other previously known results

    Sobre modelos de rearranjo de genomas

    Get PDF
    Orientador: João MeidanisTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Rearranjo de genomas é o nome dado a eventos onde grandes blocos de DNA trocam de posição durante o processo evolutivo. Com a crescente disponibilidade de sequências completas de DNA, a análise desse tipo de eventos pode ser uma importante ferramenta para o entendimento da genômica evolutiva. Vários modelos matemáticos de rearranjo de genomas foram propostos ao longo dos últimos vinte anos. Nesta tese, desenvolvemos dois novos modelos. O primeiro foi proposto como uma definição alternativa ao conceito de distância de breakpoint. Essa distância é uma das mais simples medidas de rearranjo, mas ainda não há um consenso quanto à sua definição para o caso de genomas multi-cromossomais. Pevzner e Tesler deram uma definição em 2003 e Tannier et al. a definiram de forma diferente em 2008. Nesta tese, nós desenvolvemos uma outra alternativa, chamada de single-cut-or-join (SCJ). Nós mostramos que, no modelo SCJ, além da distância, vários problemas clássicos de rearranjo, como a mediana de rearranjo, genome halving e pequena parcimônia são fáceis, e apresentamos algoritmos polinomiais para eles. O segundo modelo que apresentamos é o formalismo algébrico por adjacências, uma extensão do formalismo algébrico proposto por Meidanis e Dias, que permite a modelagem de cromossomos lineares. Esta era a principal limitação do formalismo original, que só tratava de cromossomos circulares. Apresentamos algoritmos polinomiais para o cálculo da distância algébrica e também para encontrar cenários de rearranjo entre dois genomas. Também mostramos como calcular a distância algébrica através do grafo de adjacências, para facilitar a comparação com outras distâncias de rearranjo. Por fim, mostramos como modelar todas as operações clássicas de rearranjo de genomas utilizando o formalismo algébricoAbstract: Genome rearrangements are events where large blocks of DNA exchange places during evolution. With the growing availability of whole genome data, the analysis of these events can be a very important and promising tool for understanding evolutionary genomics. Several mathematical models of genome rearrangement have been proposed in the last 20 years. In this thesis, we propose two new rearrangement models. The first was introduced as an alternative definition of the breakpoint distance. The breakpoint distance is one of the most straightforward genome comparison measures, but when it comes to defining it precisely for multichromosomal genomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, and Tannier et al. defined it differently in 2008. In this thesis we provide yet another alternative, calling it single-cut-or-join (SCJ). We show that several genome rearrangement problems, such as genome median, genome halving and small parsimony, become easy for SCJ, and provide polynomial time algorithms for them. The second model we introduce is the Adjacency Algebraic Theory, an extension of the Algebraic Formalism proposed by Meidanis and Dias that allows the modeling of linear chromosomes, the main limitation of the original formalism, which could deal with circular chromosomes only. We believe that the algebraic formalism is an interesting alternative for solving rearrangement problems, with a different perspective that could complement the more commonly used combinatorial graph-theoretic approach. We present polynomial time algorithms to compute the algebraic distance and find rearrangement scenarios between two genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. Finally, we show how all classic rearrangement operations can be modeled using the algebraic theoryDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps
    corecore