1,112 research outputs found

    On the Inversion-Indel Distance

    Get PDF
    Willing E, Zaccaria S, Dias Vieira Braga M, Stoye J. On the Inversion-Indel Distance. BMC Bioinformatics. 2013;14(Suppl 15: Proc. of RECOMB-CG 2013): S3.Background The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995. In 2000, El-Mabrouk extended the inversion model to allow the comparison of unichromosomal genomes with unequal contents, thus insertions and deletions of DNA segments besides inversions. However, an exact algorithm was presented only for the case in which we have insertions alone and no deletion (or vice versa), while a heuristic was provided for the symmetric case, that allows both insertions and deletions and is called the inversion-indel distance. In 2005, Yancopoulos, Attie and Friedberg started a new branch of research by introducing the generic double cut and join (DCJ) operation, that can represent several genome rearrangements (including inversions). Among others, the DCJ model gave rise to two important results. First, it has been shown that the inversion distance can be computed in a simpler way with the help of the DCJ operation. Second, the DCJ operation originated the DCJ-indel distance, that allows the comparison of genomes with unequal contents, considering DCJ, insertions and deletions, and can be computed in linear time. Results In the present work we put these two results together to solve an open problem, showing that, when the graph that represents the relation between the two compared genomes has no bad components, the inversion-indel distance is equal to the DCJ-indel distance. We also give a lower and an upper bound for the inversion-indel distance in the presence of bad components

    Kernelization of Whitney Switches

    Get PDF
    A fundamental theorem of Whitney from 1933 asserts that 2-connected graphs GG and HH are 2-isomorphic, or equivalently, their cycle matroids are isomorphic if and only if GG can be transformed into HH by a series of operations called Whitney switches. In this paper we consider the quantitative question arising from Whitney's theorem: Given two 2-isomorphic graphs, can we transform one into another by applying at most kk Whitney switches? This problem is already \sf NP-complete for cycles, and we investigate its parameterized complexity. We show that the problem admits a kernel of size O(k)\mathcal{O}(k) and thus is fixed-parameter tractable when parameterized by kk.publishedVersio

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

    Genome Rearrangement Problems

    Get PDF
    Various global rearrangements of permutations, such as reversals and transpositions, have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. Related to sorting by reversals is the problem of establishing the reversal diameter. The reversal diameter of Sn (the symmetric group on n elements) is the maximum number of reversals required to sort a permutation of length n. Of course, diameter problems can be posed for other global rearrangements. These various problems are of interest because the permutations can be used to represent sequences of genes in chromosomes, and the global rearrangements then represent evolutionary events. As a result, we call these problems genome rearrangement problems. Genome rearrangement problems seem to be unlike previously studied algorithmic problems on sequences, so new methods have had to be developed to deal with them. These methods predominantly employ graphs to model permutation structure. However, even using these methods, often a genome rearrangement problem has no obvious polynomial-time algorithm, and in some cases can be shown to be NP-hard. For example, the problem of sorting by reversals is NP-hard, whereas the computational complexity of sorting by transpositions is open. For problems like these, it is natural to seek polynomial-time approximation algorithms that achieve an approximation guarantee. In this thesis, we study several genome rearrangement problems as interesting and challenging algorithmic problems in their own right, including some problems for which the global rearrangement has no immediate biological equivalent. For example, we define a block-interchange to be a rearrangement that swaps any two substrings of the permutation. We examine, in particular, how the graph theoretic models relate to the genome rearrangement problems that we study. The major new results contained in this thesis are as follows: We present a 3/2-approximation algorithm for sorting by reversals. This is the best known approximation algorithm for the problem, and improves upon the 7/4 approximation bound of the previous best algorithm. We give a polynomial-time algorithm for a significant special case of sorting by reversals, thereby disproving a conjecture of Kececioglu and Sankoff, who had suggested that this special case was likely to be NP-hard. We analyse the structure of the so-called cpcle graph of a permutation in the context of sorting by transpositions, and thereby gain a deeper insight into this problem. Among the consequences are; a tighter lower bound for the problem, a simpler 3/2-aproximation algorithm than had previously been described, and algorithms that, in empirical tests, almost always find the exact transposition distance of random permutations. We introduce a natural generalisation of sorting by transpositions called sorting by block-interchanges, and present a polynomial-time algorithm for this problem. We initiate the study of analogous problems on strings over a fixed length alphabet. We establish upper and lower bounds and diameter results for the problems over a binary alphabet. We also prove that the problems analogous to sorting by reversals and sorting by block-interchanges are NP-hard. (Abstract shortened by ProQuest.)

    Rearranjo de genomas : algoritmos e complexidade

    Get PDF
    This thesis discusses events of genome rearrangements problems: transposition, breakpoint, block interchange, short block move, and the restricted multi break. We consider problems of sorting, closest permutation, and the diameter. We develop approximation algorithms, NP-completeness and properties about these problems. Regarding the sorting by transpositions, which is an NP-complete problem, several approximation algorithms were proposed based on the graph called the reality and desire diagram. Through a case analyses of the cycles of this graph, we propose a new one which achieves so far the best 1.375 ratio and O(n log n) running time complexity. Although sorting by transpositions is NP-complete, there are several metrics whose sorting problems are polynomial or are open. In such cases, an interesting problem arises to find a permutation with maximum distance of an input permutation set at most some value, this is the closest permutation problem. We show that with respect to the polynomial distance problems of breakpoint and of block interchange, both problems are NP-complete. In order to explore properties on operations that are restriction or generalization of others, we deal with the operation of short block move and we propose the operation of restricted multi break. Regarding the short block move, we show tractable classes of permutations, properties on the permutation graph, and we show that the closest permutation problem is NP-complete. Regarding the restricted multi break, we study two versions: one where the number of non reversible blocks is bounded by a constant, and another one whose number of non reversible blocks is arbitrary. We prove tight bounds on the distance and the diameter problems for both versions.Esta tese trata de rearranjo de genomas nos eventos de: transposição, pontos de quebra, movimento de blocos, movimento de blocos curtos, e de multi corte restritos. Abordamos os problemas de ordenação, permutação mais próxima, e de diâmetro. Apresentamos algoritmos aproximativos, NP-completudes e propriedades. Sobre o problema de ordenação por transposições, provado ser NP-completo, alguns algoritmos aproximativos foram propostos baseados no grafo chamado diagrama de realidade e desejo. Através da análise dos ciclos deste grafo, propomos um novo algoritmo que atinge melhores resultados correntes, tanto de razão de aproximação de 1,375 quanto de complexidade de tempo de O(n log n). Embora ordenação por transposições seja NP-completo, há outros problemas polinomiais ou em aberto. Nestes casos, surge o desafio de encontrar uma permutação que esteja a uma distância máxima limitada por algum valor em relação a um conjunto de permutações dadas de entrada. Este é o problema de encontrar a permutação mais próxima. Mostramos que, em relação `as operações de pontos de quebra e de movimento de blocos, tais problemas são NP-completos. Com o objetivo de obter propriedades sobre operações que restingem ou generalizam outras, tratamos da operação de movimento de blocos curtos e propomos a operação de multi corte restritos. Sobre movimento de blocos curtos, mostramos classes com distâncias exatas, propriedades sobre o grafo de permutação, e mostramos que o problema de permutação mais próxima é NP-completo. Sobre multi corte restritos, tratamos de duas variações: uma cujo número de blocos não reversíveis é limitado por constante, e outra cujo número de blocos não reversíveis é arbitrário. Mostramos limites justos de distância e de diâmetro para ambas as versões
    corecore