7 research outputs found

    Rearranjo de genomas : algoritmos e complexidade

    Get PDF
    This thesis discusses events of genome rearrangements problems: transposition, breakpoint, block interchange, short block move, and the restricted multi break. We consider problems of sorting, closest permutation, and the diameter. We develop approximation algorithms, NP-completeness and properties about these problems. Regarding the sorting by transpositions, which is an NP-complete problem, several approximation algorithms were proposed based on the graph called the reality and desire diagram. Through a case analyses of the cycles of this graph, we propose a new one which achieves so far the best 1.375 ratio and O(n log n) running time complexity. Although sorting by transpositions is NP-complete, there are several metrics whose sorting problems are polynomial or are open. In such cases, an interesting problem arises to find a permutation with maximum distance of an input permutation set at most some value, this is the closest permutation problem. We show that with respect to the polynomial distance problems of breakpoint and of block interchange, both problems are NP-complete. In order to explore properties on operations that are restriction or generalization of others, we deal with the operation of short block move and we propose the operation of restricted multi break. Regarding the short block move, we show tractable classes of permutations, properties on the permutation graph, and we show that the closest permutation problem is NP-complete. Regarding the restricted multi break, we study two versions: one where the number of non reversible blocks is bounded by a constant, and another one whose number of non reversible blocks is arbitrary. We prove tight bounds on the distance and the diameter problems for both versions.Esta tese trata de rearranjo de genomas nos eventos de: transposição, pontos de quebra, movimento de blocos, movimento de blocos curtos, e de multi corte restritos. Abordamos os problemas de ordenação, permutação mais próxima, e de diâmetro. Apresentamos algoritmos aproximativos, NP-completudes e propriedades. Sobre o problema de ordenação por transposições, provado ser NP-completo, alguns algoritmos aproximativos foram propostos baseados no grafo chamado diagrama de realidade e desejo. Através da análise dos ciclos deste grafo, propomos um novo algoritmo que atinge melhores resultados correntes, tanto de razão de aproximação de 1,375 quanto de complexidade de tempo de O(n log n). Embora ordenação por transposições seja NP-completo, há outros problemas polinomiais ou em aberto. Nestes casos, surge o desafio de encontrar uma permutação que esteja a uma distância máxima limitada por algum valor em relação a um conjunto de permutações dadas de entrada. Este é o problema de encontrar a permutação mais próxima. Mostramos que, em relação `as operações de pontos de quebra e de movimento de blocos, tais problemas são NP-completos. Com o objetivo de obter propriedades sobre operações que restingem ou generalizam outras, tratamos da operação de movimento de blocos curtos e propomos a operação de multi corte restritos. Sobre movimento de blocos curtos, mostramos classes com distâncias exatas, propriedades sobre o grafo de permutação, e mostramos que o problema de permutação mais próxima é NP-completo. Sobre multi corte restritos, tratamos de duas variações: uma cujo número de blocos não reversíveis é limitado por constante, e outra cujo número de blocos não reversíveis é arbitrário. Mostramos limites justos de distância e de diâmetro para ambas as versões

    O problema da ordenação de permutações usando rearranjos de prefixos e sufixos

    Get PDF
    Orientador: Zanoni DiasTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O Problema das Panquecas tem como objetivo ordenar uma pilha de panquecas que possuem tamanhos distintos realizando o menor número possível de operações. A operação permitida é chamada reversão de prefixo e, quando aplicada, inverte o topo da pilha de panquecas. Tal problema é interessante do ponto de vista combinatório por si só, mas ele também possui algumas aplicações em biologia computacional. Dados dois genomas que compartilham o mesmo número de genes, e assumindo que cada gene aparece apenas uma vez por genoma, podemos representá-los como permutações (pilhas de panquecas também são representadas por permutações). Então, podemos comparar os genomas tentando descobrir como um foi transformado no outro por meio da aplicação de rearranjos de genoma, que são eventos de mutação de grande escala. Reversões e transposições são os tipos mais comumente estudados de rearranjo de genomas e uma reversão de prefixo (ou transposição de prefixo) é um tipo de reversão (ou transposição) que é restrita ao início da permutação. Quando o rearranjo é restrito ao final da permutação, dizemos que ele é um rearranjo de sufixo. Um problema de ordenação de permutações por rearranjos é, portanto, o problema de encontrar uma sequência de rearranjos de custo mínimo que ordene a permutação dada. A abordagem tradicional considera que todos os rearranjos têm o mesmo custo unitário, de forma que o objetivo é tentar encontrar o menor número de rearranjos necessários para ordenar a permutação. Vários esforços foram feitos nos últimos anos considerando essa abordagem. Por outro lado, um rearranjo muito longo (que na verdade é uma mutação) tem mais probabilidade de perturbar o organismo. Portanto, pesos baseados no comprimento do segmento envolvido podem ter um papel importante no processo evolutivo. Dizemos que essa abordagem é ponderada por comprimento e o objetivo nela é tentar encontrar uma sequência de rearranjos cujo custo total (que é a soma do custo de cada rearranjo, que por sua vez depende de seu comprimento) seja mínimo. Nessa tese nós apresentamos os primeiros resultados que envolvem problemas de ordenação de permutações por reversões e transposições de prefixo e sufixo considerando ambas abordagens tradicional e ponderada por comprimento. Na abordagem tradicional, consideramos um total de 10 problemas e desenvolvemos novos resultados para 6 deles. Na abordagem ponderada por comprimento, consideramos um total de 13 problemas e desenvolvemos novos resultados para todos elesAbstract: The goal of the Pancake Flipping problem is to sort a stack of pancakes that have different sizes by performing as few operations as possible. The operation allowed is called prefix reversal and, when applied, flips the top of the stack of pancakes. Such problem is an interesting combinatorial problem by itself, but it has some applications in computational biology. Given two genomes that share the same genes and assuming that each gene appears only once per genome, we can represent them as permutations (stacks of pancakes are also represented by permutations). Then, we can compare the genomes by figuring out how one was transformed into the other through the application of genome rearrangements, which are large scale mutations. Reversals and transpositions are the most commonly studied types of genome rearrangements and a prefix reversal (or prefix transposition) is a type of reversal (or transposition) which is restricted to the beginning of the permutation. When the rearrangement is restricted to the end of the permutation, we say it is a suffix rearrangement. A problem of sorting permutations by rearrangements is, therefore, the problem to find a sequence of rearrangements with minimum cost that sorts a given permutation. The traditional approach considers that all rearrangements have the same unitary cost, in which case the goal is trying to find the minimum number of rearrangements that are needed to sort the permutation. Numerous efforts have been made over the past years regarding this approach. On the other hand, a long rearrangement (which is in fact a mutation) is more likely to disturb the organism. Therefore, weights based on the length of the segment involved may have an important role in the evolutionary process. We say this is the length-weighted approach and the goal is trying to find a sequence of rearrangements whose total cost (the sum of the cost of each rearrangement, which depends on its length) is minimum. In this thesis we present the first results regarding problems of sorting permutations by prefix and suffix reversals and transpositions considering both the traditional and the length-weighted approach. For the traditional approach, we considered a total of 10 problems and developed new results for 6 of them. For the length-weighted approach, we considered a total of 13 problems and developed new results for all of themDoutoradoCiência da ComputaçãoDoutora em Ciência da Computação140017/2013-52013/01172-0FAPESPCNP

    Models and Algorithms for Sorting Permutations with Tandem Duplication and Random Loss

    Get PDF
    A central topic of evolutionary biology is the inference of phylogeny, i. e., the evolutionary history of species. A powerful tool for the inference of such phylogenetic relationships is the arrangement of the genes in mitochondrial genomes. The rationale is that these gene arrangements are subject to different types of mutations in the course of evolution. Hence, a high similarity in the gene arrangement between two species indicates a close evolutionary relation. Metazoan mitochondrial gene arrangements are particularly well suited for such phylogenetic studies as they are available for a wide range of species, their gene content is almost invariant, and usually free of duplicates. With these properties gene arrangements of mitochondrial genomes are modeled by permutations in which each element represents a gene, i. e., a specific genetic sequence. The mutations that shape the gene arrangement of genomes are then represented by operations that rearrange elements in permutations, so-called genome rearrangements, and thereby bridge the gap between evolutionary biology and optimization. Many problems of phylogeny inference can be formulated as challenging combinatorial optimization problems which makes this research area especially interesting for computer scientists. The most prominent examples of such optimization problems are the sorting problem and the distance problem. While the sorting problem requires a minimum length sequence of rearrangements that transforms one given permutation into another given permutation, i. e., it aims for a hypothetical scenario of gene order evolution, the distance problem intends to determine only the length of such a sequence. This minimum length is called distance and used as a (dis)similarity measure quantifying the evolutionary relatedness. Most evolutionary changes occurring in gene arrangements of mitochondrial genomes can be explained by the tandem duplication random loss (TDRL) genome rearrangement model. A TDRL consists of a duplication of a consecutive set of genes in tandem followed by a random loss of one copy of each duplicated gene. In spite of the importance of the TDRL genome rearrangement in mitochondrial evolution, its combinatorial properties have rarely been studied. In addition, models of genome rearrangements which include all types of rearrangement that are relevant for mitochondrial genomes, i. e., inversions, transpositions, inverse transpositions, and TDRLs, while admitting computational tractability are rare. Nevertheless, especially for metazoan gene arrangements the TDRL rearrangement should be considered for the reconstruction of phylogeny. Realizing that a better understanding of the TDRL model is indispensable for the study of mitochondrial gene arrangements, the central theme of this thesis is to broaden the horizon of TDRL genome rearrangements with respect to mitochondrial genome evolution. For this purpose, this thesis provides combinatorial properties of the TDRL model and its variants as well as efficient methods for a plausible reconstruction of rearrangement scenarios between gene arrangements. The methods that are proposed consider all types of genome rearrangements that predominately occur during mitochondrial evolution. More precisely, the main points contained in this thesis are as follows: The distance problem and the sorting problem for the TDRL model are further examined in respect to circular permutations, a formal concept that reflects the circular structure of mitochondrial genomes. As a result, a closed formula for the distance is provided. Recently, evidence for a variant of the TDRL rearrangement model in which the duplicated set of genes is additionally inverted have been found. Initiating the algorithmic study of this new rearrangement model on a certain type of permutations, a closed formula solving the distance problem is proposed as well as a quasilinear time algorithm that solves the corresponding sorting problem. The assumption that only one type of genome rearrangement has occurred during the evolution of certain gene arrangements is most likely unrealistic, e. g., at least three types of rearrangements on top of the TDRL rearrangement have to be considered for the evolution metazoan mitochondrial genomes. Therefore, three different biologically motivated constraints are taken into account in this thesis in order to produce plausible evolutionary rearrangement scenarios. The first constraint is extending the considered set of genome rearrangements to the model that covers all four common types of mitochondrial genome rearrangements. For this 4-type model a sharp lower bound and several close additive upper bounds on the distance are developed. As a byproduct, a polynomial-time approximation algorithm for the corresponding sorting problem is provided that guarantees the computation of pairwise rearrangement scenarios that deviate from a minimum length scenario by at most two rearrangement operations. The second biologically motivated constraint is the relative frequency of the different types of rearrangements occurring during the evolution. The frequency is modeled by employing a weighting scheme on the 4-type model in which every rearrangement is weighted with respect to its type. The resulting NP-hard sorting problem is then solved by means of a polynomial size integer linear program. The third biologically motivated constraint that has been taken into account is that certain subsets of genes are often found in close proximity in the gene arrangements of many different species. This observation is reflected by demanding rearrangement scenarios to preserve certain groups of genes which are modeled by common intervals of permutations. In order to solve the sorting problem that considers all three types of biologically motivated constraints, the exact dynamic programming algorithm CREx2 is proposed. CREx2 has a linear runtime for a large class of problem instances. Otherwise, two versions of the CREx2 are provided: The first version provides exact solutions but has an exponential runtime in the worst case and the second version provides approximated solutions efficiently. CREx2 is evaluated by an empirical study for simulated artificial and real biological mitochondrial gene arrangements

    Évolution des génomes par mutations locales et globales : une approche d’alignement

    Get PDF
    Durant leur évolution, les génomes accumulent des mutations pouvant affecter d’un nucléotide à plusieurs gènes. Les modifications au niveau du nombre et de l’organisation des gènes dans les génomes sont dues à des mutations globales, telles que les duplications, les pertes et les réarrangements. En comparant les ordres de gènes des génomes, il est possible d’inférer les événements évolutifs les plus fréquents, le contenu en gènes des espèces ancestrales ainsi que les histoires évolutives ayant menées aux ordres observés. Dans cette thèse, nous nous intéressons au développement de nouvelles méthodes algorithmiques, par approche d’alignement, afin d’analyser ces différents aspects de l’évolution des génomes. Nous nous intéressons à la comparaison de deux ou d’un ensemble de génomes reliés par une phylogénie, en tenant compte des mutations globales. Pour commencer, nous étudions la complexité théorique de plusieurs variantes du problème de l’alignement de deux ordres de gènes par duplications et pertes, ainsi que de l’approximabilité de ces problèmes. Nous rappelons ensuite les algorithmes exacts, en temps exponentiel, existants, et développons des heuristiques efficaces. Nous proposons, dans un premier temps, DLAlign, une heuristique quadratique pour le problème d’alignement de deux ordres de gènes par duplications et pertes. Ensuite, nous présentons, OrthoAlign, une extension de DLAlign, qui considère, en plus des duplications et pertes, les réarrangements et les substitutions. Nous abordons également le problème de l’alignement phylogénétique de génomes. Pour commencer, l’heuristique OrthoAlign est adaptée afin de permettre l’inférence de génomes ancestraux au noeuds internes d’un arbre phylogénétique. Nous proposons enfin, MultiOrthoAlign, une heuristique plus robuste, basée sur la médiane, pour l’inférence de génomes ancestraux et d’histoires évolutives d’un ensemble de génomes représentés aux feuilles d’un arbre phylogénétique.During the evolution process, genomes accumulate mutations that may affect the genome at different levels, ranging from one base to the overall gene content. Global mutations affecting gene content and organization are mainly duplications, losses and rearrangements. By comparing gene orders, it is possible to infer the most frequent events, the gene content in the ancestral genomes and the evolutionary histories of the observed gene orders. In this thesis, we are interested in developing new algorithmic methods based on an alignment approach for comparing two or a set of genomes represented as gene orders and related through a phylogenetic tree, based on global mutations. We study the theoretical complexity and the approximability of different variants of the two gene orders alignment problem by duplications and losses. Then, we present the existing exact exponential time algorithms, and develop efficient heuristics for these problems. First, we developed DLAlign, a quadratic time heuristic for the two gene orders alignment problem by duplications and losses. Then, we developed OrthoAlign, a generalization of DLAlign, accounting for most genome-wide evolutionary events such as duplications, losses, rearrangements and substitutions. We also study the phylogenetic alignment problem. First, we adapt our heuristic OrthoAlign in order to infer ancestral genomes at the internal nodes of a given phylogenetic tree. Finally, we developed MultiOrthoAlign, a more robust heuristic, based on the median problem, for the inference of ancestral genomes and evolutionary histories of extent genomes labeling leaves of a phylogenetic tree

    Ordenação de sequências finitas por reversões usando conjugações em grupos de permutações

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2012.O problema da distância entre duas sequências finitas por reversões é estudado neste trabalho em uma abordagem formal e algébrica com base em um grupo de permutações onde a operação do grupo e a de conjugação são utilizadas para simular reversões. A abordagem é geral por operar com sequências genéricas que podem representar estruturas utilizadas na especificação de uma grande variedade de problemas relevantes em computação e em matemática, entre as quais se incluem os genomas. Uma estrutura de grupo foi estabelecida sobre um conjunto de famílias de sequências para possibilitar aplicar uma reversão em uma destas famílias mediante uma operação de conjugação, onde a família e uma certa permutação, atuando como um conjugador, participam como fatores ou termos. Além disso, a mesma reversão pode ser simulada pela operação do grupo utilizando, como fatores ou termos, um conjugador especial da família e a mesma permutação que, acima, atuou como conjugador. O trabalho propõe um método diferente de computar que conduz a uma maneira di- ferente de pensar no problema da distância de reversão o que, eventualmente, poderá contribuir na descoberta de respostas a questões nesta área. Os programas computacionais que utilizam reversões podem ser adaptados ao método. ______________________________________________________________________________ ABSTRACTThe problem of sorting finite sequences by reversals is studied in this work using a formal and algebraic approach based on a group of permutations where the operation of the group and the conjugation are used to implement reversals. The approach is general in the sense that it treats generic sequences which can represent structures, used in the specification of a wide variety of relevant problems in computer science and mathematics, among them are included the genomes. A permutation group structure was established on a set of families of sequences in order to apply a reversal on one of this family through the operation of conjugation, where the family and one certain permutation, acting as a conjugator, participate as factors or terms. Moreover, the same reversal may be applied by means the group operation using, as factors or terms, a special conjugator of the family and the same permutation, above mentioned, that served as conjugator. The work proposes a different method of computing resulting in a different way of thinking about the reversal distance problem which may possibly contribute to find answers in this area. Computer programs that use reversals can be adapted for the method

    Vergleichen und Aggregieren von partiellen Ordnungen

    Get PDF
    Das Vergleichen und Aggregieren von Informationen ist ein zentraler Bereich in der Analyse von Wahlsystemen. In diesen müssen die verschiedenen Meinungen von Wählern über eine Menge von Kandidaten zu einem möglichst gerechten Wahlergebnis aggregiert werden. In den meisten politischen Wahlen entscheidet sich jeder Wähler durch Ankreuzen für einen einzigen Kandidaten. Daneben werden aber auch Rangordnungsprobleme als eine Variante von Wahlsystemen untersucht. Bei diesen bringt jeder Wähler seine Meinung in Form einer totalen Ordnung über der Menge der Kandidaten zum Ausdruck, wodurch seine oftmals komplexe Meinung exakter repräsentiert werden kann als durch die Auswahl eines einzigen, favorisierten Kandidaten. Das Wahlergebnis eines Rangordnungsproblems ist dann eine ebenfalls totale Ordnung der Kandidaten, welche die geringste Distanz zu den Meinungen der Wähler aufweist. Als Distanzmaße zwischen zwei totalen Ordnungen haben sich neben anderen Kendalls Tau-Distanz und Spearmans Footrule-Distanz etabliert. Durch moderne Anwendungsmöglichkeiten von Rangordnungsproblemen im maschinellen Lernen, in der künstlichen Intelligenz, in der Bioinformatik und vor allem in verschiedenen Bereichen des World Wide Web rücken bereits bekannte, jedoch bislang eher wenig studierte Aspekte in den Fokus der Forschung. Zum einen gewinnt die algorithmische Komplexität von Rangordnungsproblemen an Bedeutung. Zum anderen existieren in vielen dieser Anwendungen unvollständige „Wählermeinungen“ mit unentschiedenen oder unvergleichbaren Kandidaten, so dass totale Ordnungen zu deren Repräsentation nicht länger geeignet sind. Die vorliegende Arbeit greift diese beiden Aspekte auf und betrachtet die algorithmische Komplexität von Rangordnungsproblemen, in denen Wählermeinungen anstatt durch totale Ordnungen durch schwache oder partielle Ordnungen repräsentiert werden. Dazu werden Kendalls Tau-Distanz und Spearmans Footrule-Distanz auf verschiedene nahe liegende Arten verallgemeinert. Es zeigt sich dabei, dass nun bereits die Distanzberechnung zwischen zwei Ordnungen ein algorithmisch komplexes Problem darstellt. So ist die Berechnung der verallgemeinerten Versionen von Kendalls Tau-Distanz oder Spearmans Footrule-Distanz für schwache Ordnungen noch effizient möglich. Sobald jedoch partielle Ordnungen betrachtet werden, sind die Probleme NP-vollständig, also vermutlich nicht mehr effizient lösbar. In diesem Fall werden Resultate zur Approximierbarkeit und zur parametrisierten Komplexität der Probleme vorgestellt. Auch die Komplexität der Rangordnungsprobleme selbst erhöht sich. Für totale Ordnungen effizient lösbare Varianten werden für schwache Ordnungen NP-vollständig, für totale Ordnungen NP-vollständige Varianten hingegen liegen für partielle Ordnungen teilweise außerhalb der Komplexitätsklasse NP. Die Arbeit schließt mit einem Ausblick auf offene Problemstellungen
    corecore