47 research outputs found

    Lower bounding edit distances between permutations

    Get PDF
    International audienceA number of fields, including the study of genome rearrangements and the design of interconnection networks, deal with the connected problems of sorting permutations in "as few moves as possible", using a given set of allowed operations, or computing the number of moves the sorting process requires, often referred to as the distance of the permutation. These operations often act on just one or two segments of the permutation, e.g. by reversing one segment or exchanging two segments. The cycle graph of the permutation to sort is a fundamental tool in the theory of genome rearrangements, and has proved useful in settling the complexity of many variants of the above problems. In this paper, we present an algebraic reinterpretation of the cycle graph of a permutation π as an even permutation π, and show how to reformulate our sorting problems in terms of particular factorisations of the latter permutation. Using our framework, we recover known results in a simple and unified way, and obtain a new lower bound on the prefix transposition distance (where a prefix transposition displaces the initial segment of a permutation), which is shown to outperform previous results. Moreover, we use our approach to improve the best known lower bound on the prefix transposition diameter from 2n/3 to ⌊3n/4⌋, and investigate a few relations between some statistics on π and π

    Sorting by Prefix Block-Interchanges

    Get PDF
    We initiate the study of sorting permutations using prefix block-interchanges, which exchange any prefix of a permutation with another non-intersecting interval. The goal is to transform a given permutation into the identity permutation using as few such operations as possible. We give a 2-approximation algorithm for this problem, show how to obtain improved lower and upper bounds on the corresponding distance, and determine the largest possible value for that distance

    Genome Rearrangement Problems

    Get PDF
    Various global rearrangements of permutations, such as reversals and transpositions, have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. Related to sorting by reversals is the problem of establishing the reversal diameter. The reversal diameter of Sn (the symmetric group on n elements) is the maximum number of reversals required to sort a permutation of length n. Of course, diameter problems can be posed for other global rearrangements. These various problems are of interest because the permutations can be used to represent sequences of genes in chromosomes, and the global rearrangements then represent evolutionary events. As a result, we call these problems genome rearrangement problems. Genome rearrangement problems seem to be unlike previously studied algorithmic problems on sequences, so new methods have had to be developed to deal with them. These methods predominantly employ graphs to model permutation structure. However, even using these methods, often a genome rearrangement problem has no obvious polynomial-time algorithm, and in some cases can be shown to be NP-hard. For example, the problem of sorting by reversals is NP-hard, whereas the computational complexity of sorting by transpositions is open. For problems like these, it is natural to seek polynomial-time approximation algorithms that achieve an approximation guarantee. In this thesis, we study several genome rearrangement problems as interesting and challenging algorithmic problems in their own right, including some problems for which the global rearrangement has no immediate biological equivalent. For example, we define a block-interchange to be a rearrangement that swaps any two substrings of the permutation. We examine, in particular, how the graph theoretic models relate to the genome rearrangement problems that we study. The major new results contained in this thesis are as follows: We present a 3/2-approximation algorithm for sorting by reversals. This is the best known approximation algorithm for the problem, and improves upon the 7/4 approximation bound of the previous best algorithm. We give a polynomial-time algorithm for a significant special case of sorting by reversals, thereby disproving a conjecture of Kececioglu and Sankoff, who had suggested that this special case was likely to be NP-hard. We analyse the structure of the so-called cpcle graph of a permutation in the context of sorting by transpositions, and thereby gain a deeper insight into this problem. Among the consequences are; a tighter lower bound for the problem, a simpler 3/2-aproximation algorithm than had previously been described, and algorithms that, in empirical tests, almost always find the exact transposition distance of random permutations. We introduce a natural generalisation of sorting by transpositions called sorting by block-interchanges, and present a polynomial-time algorithm for this problem. We initiate the study of analogous problems on strings over a fixed length alphabet. We establish upper and lower bounds and diameter results for the problems over a binary alphabet. We also prove that the problems analogous to sorting by reversals and sorting by block-interchanges are NP-hard. (Abstract shortened by ProQuest.)

    Algorithmic approaches for genome rearrangement: a review

    Full text link

    O problema da ordenação de permutações usando rearranjos de prefixos e sufixos

    Get PDF
    Orientador: Zanoni DiasTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O Problema das Panquecas tem como objetivo ordenar uma pilha de panquecas que possuem tamanhos distintos realizando o menor número possível de operações. A operação permitida é chamada reversão de prefixo e, quando aplicada, inverte o topo da pilha de panquecas. Tal problema é interessante do ponto de vista combinatório por si só, mas ele também possui algumas aplicações em biologia computacional. Dados dois genomas que compartilham o mesmo número de genes, e assumindo que cada gene aparece apenas uma vez por genoma, podemos representá-los como permutações (pilhas de panquecas também são representadas por permutações). Então, podemos comparar os genomas tentando descobrir como um foi transformado no outro por meio da aplicação de rearranjos de genoma, que são eventos de mutação de grande escala. Reversões e transposições são os tipos mais comumente estudados de rearranjo de genomas e uma reversão de prefixo (ou transposição de prefixo) é um tipo de reversão (ou transposição) que é restrita ao início da permutação. Quando o rearranjo é restrito ao final da permutação, dizemos que ele é um rearranjo de sufixo. Um problema de ordenação de permutações por rearranjos é, portanto, o problema de encontrar uma sequência de rearranjos de custo mínimo que ordene a permutação dada. A abordagem tradicional considera que todos os rearranjos têm o mesmo custo unitário, de forma que o objetivo é tentar encontrar o menor número de rearranjos necessários para ordenar a permutação. Vários esforços foram feitos nos últimos anos considerando essa abordagem. Por outro lado, um rearranjo muito longo (que na verdade é uma mutação) tem mais probabilidade de perturbar o organismo. Portanto, pesos baseados no comprimento do segmento envolvido podem ter um papel importante no processo evolutivo. Dizemos que essa abordagem é ponderada por comprimento e o objetivo nela é tentar encontrar uma sequência de rearranjos cujo custo total (que é a soma do custo de cada rearranjo, que por sua vez depende de seu comprimento) seja mínimo. Nessa tese nós apresentamos os primeiros resultados que envolvem problemas de ordenação de permutações por reversões e transposições de prefixo e sufixo considerando ambas abordagens tradicional e ponderada por comprimento. Na abordagem tradicional, consideramos um total de 10 problemas e desenvolvemos novos resultados para 6 deles. Na abordagem ponderada por comprimento, consideramos um total de 13 problemas e desenvolvemos novos resultados para todos elesAbstract: The goal of the Pancake Flipping problem is to sort a stack of pancakes that have different sizes by performing as few operations as possible. The operation allowed is called prefix reversal and, when applied, flips the top of the stack of pancakes. Such problem is an interesting combinatorial problem by itself, but it has some applications in computational biology. Given two genomes that share the same genes and assuming that each gene appears only once per genome, we can represent them as permutations (stacks of pancakes are also represented by permutations). Then, we can compare the genomes by figuring out how one was transformed into the other through the application of genome rearrangements, which are large scale mutations. Reversals and transpositions are the most commonly studied types of genome rearrangements and a prefix reversal (or prefix transposition) is a type of reversal (or transposition) which is restricted to the beginning of the permutation. When the rearrangement is restricted to the end of the permutation, we say it is a suffix rearrangement. A problem of sorting permutations by rearrangements is, therefore, the problem to find a sequence of rearrangements with minimum cost that sorts a given permutation. The traditional approach considers that all rearrangements have the same unitary cost, in which case the goal is trying to find the minimum number of rearrangements that are needed to sort the permutation. Numerous efforts have been made over the past years regarding this approach. On the other hand, a long rearrangement (which is in fact a mutation) is more likely to disturb the organism. Therefore, weights based on the length of the segment involved may have an important role in the evolutionary process. We say this is the length-weighted approach and the goal is trying to find a sequence of rearrangements whose total cost (the sum of the cost of each rearrangement, which depends on its length) is minimum. In this thesis we present the first results regarding problems of sorting permutations by prefix and suffix reversals and transpositions considering both the traditional and the length-weighted approach. For the traditional approach, we considered a total of 10 problems and developed new results for 6 of them. For the length-weighted approach, we considered a total of 13 problems and developed new results for all of themDoutoradoCiência da ComputaçãoDoutora em Ciência da Computação140017/2013-52013/01172-0FAPESPCNP

    Rearranjo de genomas : uma coletanea de artigos

    Get PDF
    Orientador : João MeidanisTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Hoje em dia, estão disponíveis, publicamente, uma imensa quantidade de informações genéticas. O desafio atual da Genômica é processar estes dados de forma a obter conclusões biológicas relevantes. Uma das maneiras de estruturar estas informações é através de comparação de genomas, que busca semelhanças e diferenças entre os genomas de dois ou mais organismos. Neste contexto, a área de Rearranjo de Genomas vem recebendo bastante atenção ultimamente. Uma forma de comparar genomas é através da distância de rearranjo, determinada pelo número mínimo de eventos de rearranjo que podem explicar as diferenças entre dois genomas. Os principais estudos em distância de rearranjo envolvem eventos de reversões e transposições. A presente coletânea é composta de oito artigos, contendo vários resultados importantes sobre Rearranjo de Genomas. Estes trabalhos foram apresentados em seis conferências, sendo uma nacional e cinco internacionais. Dois destes trabalhos serão publicados em importantes revistas internacionais e outro foi incluído como um capítulo de um livro. Nossas principais contribuições podem ser divididas em dois grupos: um novo formalismo algébrico e uma série de resultados envolvendo o evento de transposição. A nova teoria algébrica relaciona a teoria de Rearranjo de Genomas com a de grupos de permutações. Nossa intenção foi estabelecer um formalismo algébrico que simplificasse a obtenção de novos resultados, até hoje, muito baseados na construção de diagramas. Estudamos o evento de transposição de várias formas. Além de apresentarmos resultados sobre a distância de transposição entre uma permutação e sua inversa, também estudamos o problema de rearranjo envolvendo transposições e reversões simultaneamente, construindo algoritmos de aproximação e estabelecendo uma conjectura sobre o diâmetro. Usamos o formalismo algébrico para mostrar que é possível determinar a distância de fusão, fissão e transposição em tempo polinomial. Este é o primeiro resultado polinomial conhecido para um problema de rearranjo envolvendo o evento de transposição. Por último, introduzimos dois novos problemas de rearranjo: o problema de distância sintênica envolvendo fusões e fissões, e o problema de transposição de prefixos. Para ambos apresentamos resultados significativos, que avançam o conhecimento na áreaAbstract: Nowadays, a huge amount of genetic information is public1y available. Genomic's current challenge is to process this information in order to obtain relevant biological conc1usions. One possible way of structuring this information is through genome comparison, where we seek similarities and differences among the genomes of two or more organisms. In this context, the area of Genome Rearrangements has received considerable attention lately. One way of comparing genomes is given by the rearrangement distance, which is determined by the minimum number of rearrangement events that explain the differences between two genomes. The main studies in rearrangement distance involve reversal and transposition events. The present collection is composed of eight artic1es, containing several important results on Genome Rearrangements. These papers were presented in six conferences, one with Brazilian scope and five with international scope. Two of these works will be published in important international journals, and one other work appeared as a book chapter. Our main contributions can be divided into two groups: a new algebraic formalism and a series of results involving the transposition event. The new algebraic theory relates the genome rearrangement theory to the theory of permutation groups. Our intention was to establish an algebraic formalism that simplifies the creation of new results, up to now excessively based on the construction of diagrams. We studied the transposition event in several ways. Besides presenting results on the transpositions distance between a permutation and its inverse, we also studied the rearrangement problem involving transpositions and reversals simultaneously, constructing approximation algorithms and proposing a conjecture on the diameter. We used the algebraic formalism to show that it is possible to determine the distance of fusion, fission, and transposition in polynomial time. This is the first polynomial time result for a rearrangement problem involving the transposition event. Finally, we introduced two now rearrangement problems: the syntenic distance problem involving fission and fusion, and the prefix transposition problem. For each one of these problems we present significant results, widening the knowledge in this areaDoutoradoDoutor em Ciência da Computaçã
    corecore