348 research outputs found

    Rearranjo de genomas : algoritmos e complexidade

    Get PDF
    This thesis discusses events of genome rearrangements problems: transposition, breakpoint, block interchange, short block move, and the restricted multi break. We consider problems of sorting, closest permutation, and the diameter. We develop approximation algorithms, NP-completeness and properties about these problems. Regarding the sorting by transpositions, which is an NP-complete problem, several approximation algorithms were proposed based on the graph called the reality and desire diagram. Through a case analyses of the cycles of this graph, we propose a new one which achieves so far the best 1.375 ratio and O(n log n) running time complexity. Although sorting by transpositions is NP-complete, there are several metrics whose sorting problems are polynomial or are open. In such cases, an interesting problem arises to find a permutation with maximum distance of an input permutation set at most some value, this is the closest permutation problem. We show that with respect to the polynomial distance problems of breakpoint and of block interchange, both problems are NP-complete. In order to explore properties on operations that are restriction or generalization of others, we deal with the operation of short block move and we propose the operation of restricted multi break. Regarding the short block move, we show tractable classes of permutations, properties on the permutation graph, and we show that the closest permutation problem is NP-complete. Regarding the restricted multi break, we study two versions: one where the number of non reversible blocks is bounded by a constant, and another one whose number of non reversible blocks is arbitrary. We prove tight bounds on the distance and the diameter problems for both versions.Esta tese trata de rearranjo de genomas nos eventos de: transposição, pontos de quebra, movimento de blocos, movimento de blocos curtos, e de multi corte restritos. Abordamos os problemas de ordenação, permutação mais próxima, e de diâmetro. Apresentamos algoritmos aproximativos, NP-completudes e propriedades. Sobre o problema de ordenação por transposições, provado ser NP-completo, alguns algoritmos aproximativos foram propostos baseados no grafo chamado diagrama de realidade e desejo. Através da análise dos ciclos deste grafo, propomos um novo algoritmo que atinge melhores resultados correntes, tanto de razão de aproximação de 1,375 quanto de complexidade de tempo de O(n log n). Embora ordenação por transposições seja NP-completo, há outros problemas polinomiais ou em aberto. Nestes casos, surge o desafio de encontrar uma permutação que esteja a uma distância máxima limitada por algum valor em relação a um conjunto de permutações dadas de entrada. Este é o problema de encontrar a permutação mais próxima. Mostramos que, em relação `as operações de pontos de quebra e de movimento de blocos, tais problemas são NP-completos. Com o objetivo de obter propriedades sobre operações que restingem ou generalizam outras, tratamos da operação de movimento de blocos curtos e propomos a operação de multi corte restritos. Sobre movimento de blocos curtos, mostramos classes com distâncias exatas, propriedades sobre o grafo de permutação, e mostramos que o problema de permutação mais próxima é NP-completo. Sobre multi corte restritos, tratamos de duas variações: uma cujo número de blocos não reversíveis é limitado por constante, e outra cujo número de blocos não reversíveis é arbitrário. Mostramos limites justos de distância e de diâmetro para ambas as versões

    Sobre modelos de rearranjo de genomas

    Get PDF
    Orientador: João MeidanisTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Rearranjo de genomas é o nome dado a eventos onde grandes blocos de DNA trocam de posição durante o processo evolutivo. Com a crescente disponibilidade de sequências completas de DNA, a análise desse tipo de eventos pode ser uma importante ferramenta para o entendimento da genômica evolutiva. Vários modelos matemáticos de rearranjo de genomas foram propostos ao longo dos últimos vinte anos. Nesta tese, desenvolvemos dois novos modelos. O primeiro foi proposto como uma definição alternativa ao conceito de distância de breakpoint. Essa distância é uma das mais simples medidas de rearranjo, mas ainda não há um consenso quanto à sua definição para o caso de genomas multi-cromossomais. Pevzner e Tesler deram uma definição em 2003 e Tannier et al. a definiram de forma diferente em 2008. Nesta tese, nós desenvolvemos uma outra alternativa, chamada de single-cut-or-join (SCJ). Nós mostramos que, no modelo SCJ, além da distância, vários problemas clássicos de rearranjo, como a mediana de rearranjo, genome halving e pequena parcimônia são fáceis, e apresentamos algoritmos polinomiais para eles. O segundo modelo que apresentamos é o formalismo algébrico por adjacências, uma extensão do formalismo algébrico proposto por Meidanis e Dias, que permite a modelagem de cromossomos lineares. Esta era a principal limitação do formalismo original, que só tratava de cromossomos circulares. Apresentamos algoritmos polinomiais para o cálculo da distância algébrica e também para encontrar cenários de rearranjo entre dois genomas. Também mostramos como calcular a distância algébrica através do grafo de adjacências, para facilitar a comparação com outras distâncias de rearranjo. Por fim, mostramos como modelar todas as operações clássicas de rearranjo de genomas utilizando o formalismo algébricoAbstract: Genome rearrangements are events where large blocks of DNA exchange places during evolution. With the growing availability of whole genome data, the analysis of these events can be a very important and promising tool for understanding evolutionary genomics. Several mathematical models of genome rearrangement have been proposed in the last 20 years. In this thesis, we propose two new rearrangement models. The first was introduced as an alternative definition of the breakpoint distance. The breakpoint distance is one of the most straightforward genome comparison measures, but when it comes to defining it precisely for multichromosomal genomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, and Tannier et al. defined it differently in 2008. In this thesis we provide yet another alternative, calling it single-cut-or-join (SCJ). We show that several genome rearrangement problems, such as genome median, genome halving and small parsimony, become easy for SCJ, and provide polynomial time algorithms for them. The second model we introduce is the Adjacency Algebraic Theory, an extension of the Algebraic Formalism proposed by Meidanis and Dias that allows the modeling of linear chromosomes, the main limitation of the original formalism, which could deal with circular chromosomes only. We believe that the algebraic formalism is an interesting alternative for solving rearrangement problems, with a different perspective that could complement the more commonly used combinatorial graph-theoretic approach. We present polynomial time algorithms to compute the algebraic distance and find rearrangement scenarios between two genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. Finally, we show how all classic rearrangement operations can be modeled using the algebraic theoryDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Breaking Good: Accounting For Fragility Of Genomic Regions In Rearrangement Distance Estimation

    Get PDF
    Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly Small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.8514271439FAPESP [2013/25084-2]French Agence Nationale de la Recherche (ANR) [ANR-10-BINF-01-01]ICT FP7 european programme EVOEVOFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP

    Breaking Good: Accounting for Fragility of Genomic Regions in Rearrangement Distance Estimation

    Get PDF
    International audienceModels of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility tobreakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing theirprecise localization,we call “solid” the regions that are improbably broken by rearrangements and “fragile” the regions outside solidones.We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It containsas a particular case the uniform breakage model on the nucleotidic sequence,where breakage probabilities are proportional to fragileregion lengths. This is very different from the frequently used pseudo uniform model where all fragile regions have the same probabilityto break. Estimations of rearrangement distances based on the pseudo uniform model completely fail on simulations with thetruly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherentdistance estimations, especially with the pseudo uniform model, and to a lesser extent with the truly uniform model. This incoherenceis solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragileregions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairsof genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell

    The Divide-and-Conquer Subgoal-Ordering Algorithm for Speeding up Logic Inference

    Full text link
    It is common to view programs as a combination of logic and control: the logic part defines what the program must do, the control part -- how to do it. The Logic Programming paradigm was developed with the intention of separating the logic from the control. Recently, extensive research has been conducted on automatic generation of control for logic programs. Only a few of these works considered the issue of automatic generation of control for improving the efficiency of logic programs. In this paper we present a novel algorithm for automatic finding of lowest-cost subgoal orderings. The algorithm works using the divide-and-conquer strategy. The given set of subgoals is partitioned into smaller sets, based on co-occurrence of free variables. The subsets are ordered recursively and merged, yielding a provably optimal order. We experimentally demonstrate the utility of the algorithm by testing it in several domains, and discuss the possibilities of its cooperation with other existing methods
    corecore