255 research outputs found

    Sorting signed permutations by reversals, revisited

    Get PDF
    AbstractThe problem of sorting signed permutations by reversals (SBR) is a fundamental problem in computational molecular biology. The goal is, given a signed permutation, to find a shortest sequence of reversals that transforms it into the positive identity permutation, where a reversal is the operation of taking a segment of the permutation, reversing it, and flipping the signs of its elements.In this paper we describe a randomized algorithm for SBR. The algorithm tries to sort the permutation by repeatedly performing a random oriented reversal. This process is in fact a random walk on the graph where permutations are the nodes and an arc from π to πâ€Č corresponds to an oriented reversal that transforms π to πâ€Č. We show that if this random walk stops at the identity permutation, then we have found a shortest sequence. We give empirical evidence that this process indeed succeeds with high probability on a random permutation.To implement our algorithm we describe a data structure to maintain a permutation, that allows to draw an oriented reversal uniformly at random, and perform it in sub-linear time. With this data structure we can implement the random walk in O(n3/2logn) time, thus obtaining an algorithm for SBR that almost always runs in sub-quadratic time. The data structures we present may also be of independent interest for developing other algorithms for SBR, and for other problems.Finally, we present the first efficient parallel algorithm for SBR. We obtain this result by developing a fast implementation of the recent algorithm of Bergeron (Proceedings of CPM, 2001, pp. 106–117) for sorting signed permutations by reversals that is parallelizable. Our implementation runs in O(n2logn) time on a regular RAM, and in O(nlogn) time on a PRAM using n processors

    Sobre modelos de rearranjo de genomas

    Get PDF
    Orientador: JoĂŁo MeidanisTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Rearranjo de genomas Ă© o nome dado a eventos onde grandes blocos de DNA trocam de posição durante o processo evolutivo. Com a crescente disponibilidade de sequĂȘncias completas de DNA, a anĂĄlise desse tipo de eventos pode ser uma importante ferramenta para o entendimento da genĂŽmica evolutiva. VĂĄrios modelos matemĂĄticos de rearranjo de genomas foram propostos ao longo dos Ășltimos vinte anos. Nesta tese, desenvolvemos dois novos modelos. O primeiro foi proposto como uma definição alternativa ao conceito de distĂąncia de breakpoint. Essa distĂąncia Ă© uma das mais simples medidas de rearranjo, mas ainda nĂŁo hĂĄ um consenso quanto Ă  sua definição para o caso de genomas multi-cromossomais. Pevzner e Tesler deram uma definição em 2003 e Tannier et al. a definiram de forma diferente em 2008. Nesta tese, nĂłs desenvolvemos uma outra alternativa, chamada de single-cut-or-join (SCJ). NĂłs mostramos que, no modelo SCJ, alĂ©m da distĂąncia, vĂĄrios problemas clĂĄssicos de rearranjo, como a mediana de rearranjo, genome halving e pequena parcimĂŽnia sĂŁo fĂĄceis, e apresentamos algoritmos polinomiais para eles. O segundo modelo que apresentamos Ă© o formalismo algĂ©brico por adjacĂȘncias, uma extensĂŁo do formalismo algĂ©brico proposto por Meidanis e Dias, que permite a modelagem de cromossomos lineares. Esta era a principal limitação do formalismo original, que sĂł tratava de cromossomos circulares. Apresentamos algoritmos polinomiais para o cĂĄlculo da distĂąncia algĂ©brica e tambĂ©m para encontrar cenĂĄrios de rearranjo entre dois genomas. TambĂ©m mostramos como calcular a distĂąncia algĂ©brica atravĂ©s do grafo de adjacĂȘncias, para facilitar a comparação com outras distĂąncias de rearranjo. Por fim, mostramos como modelar todas as operaçÔes clĂĄssicas de rearranjo de genomas utilizando o formalismo algĂ©bricoAbstract: Genome rearrangements are events where large blocks of DNA exchange places during evolution. With the growing availability of whole genome data, the analysis of these events can be a very important and promising tool for understanding evolutionary genomics. Several mathematical models of genome rearrangement have been proposed in the last 20 years. In this thesis, we propose two new rearrangement models. The first was introduced as an alternative definition of the breakpoint distance. The breakpoint distance is one of the most straightforward genome comparison measures, but when it comes to defining it precisely for multichromosomal genomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, and Tannier et al. defined it differently in 2008. In this thesis we provide yet another alternative, calling it single-cut-or-join (SCJ). We show that several genome rearrangement problems, such as genome median, genome halving and small parsimony, become easy for SCJ, and provide polynomial time algorithms for them. The second model we introduce is the Adjacency Algebraic Theory, an extension of the Algebraic Formalism proposed by Meidanis and Dias that allows the modeling of linear chromosomes, the main limitation of the original formalism, which could deal with circular chromosomes only. We believe that the algebraic formalism is an interesting alternative for solving rearrangement problems, with a different perspective that could complement the more commonly used combinatorial graph-theoretic approach. We present polynomial time algorithms to compute the algebraic distance and find rearrangement scenarios between two genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. Finally, we show how all classic rearrangement operations can be modeled using the algebraic theoryDoutoradoCiĂȘncia da ComputaçãoDoutor em CiĂȘncia da Computaçã

    Sorting by reversals, block interchanges, tandem duplications, and deletions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Finding sequences of evolutionary operations that transform one genome into another is a classic problem in comparative genomics. While most of the genome rearrangement algorithms assume that there is exactly one copy of each gene in both genomes, this does not reflect the biological reality very well – most of the studied genomes contain duplicated gene content, which has to be removed before applying those algorithms. However, dealing with unequal gene content is a very challenging task, and only few algorithms allow operations like duplications and deletions. Almost all of these algorithms restrict these operations to have a fixed size.</p> <p>Results</p> <p>In this paper, we present a heuristic algorithm to sort an ancestral genome (with unique gene content) into a genome of a descendant (with arbitrary gene content) by reversals, block interchanges, tandem duplications, and deletions, where tandem duplications and deletions are of arbitrary size.</p> <p>Conclusion</p> <p>Experimental results show that our algorithm finds sorting sequences that are close to an optimal sorting sequence when the ancestor and the descendant are closely related. The quality of the results decreases when the genomes get more diverged or the genome size increases. Nevertheless, the calculated distances give a good approximation of the true evolutionary distances.</p

    On the Inversion-Indel Distance

    Get PDF
    Willing E, Zaccaria S, Dias Vieira Braga M, Stoye J. On the Inversion-Indel Distance. BMC Bioinformatics. 2013;14(Suppl 15: Proc. of RECOMB-CG 2013): S3.Background The inversion distance, that is the distance between two unichromosomal genomes with the same content allowing only inversions of DNA segments, can be computed thanks to a pioneering approach of Hannenhalli and Pevzner in 1995. In 2000, El-Mabrouk extended the inversion model to allow the comparison of unichromosomal genomes with unequal contents, thus insertions and deletions of DNA segments besides inversions. However, an exact algorithm was presented only for the case in which we have insertions alone and no deletion (or vice versa), while a heuristic was provided for the symmetric case, that allows both insertions and deletions and is called the inversion-indel distance. In 2005, Yancopoulos, Attie and Friedberg started a new branch of research by introducing the generic double cut and join (DCJ) operation, that can represent several genome rearrangements (including inversions). Among others, the DCJ model gave rise to two important results. First, it has been shown that the inversion distance can be computed in a simpler way with the help of the DCJ operation. Second, the DCJ operation originated the DCJ-indel distance, that allows the comparison of genomes with unequal contents, considering DCJ, insertions and deletions, and can be computed in linear time. Results In the present work we put these two results together to solve an open problem, showing that, when the graph that represents the relation between the two compared genomes has no bad components, the inversion-indel distance is equal to the DCJ-indel distance. We also give a lower and an upper bound for the inversion-indel distance in the presence of bad components

    Reconstructing the Genomic Architecture of Mammalian Ancestors Using Multispecies Comparative Maps

    Get PDF
    Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes

    A Phylogenomic Study of Human, Dog, and Mouse

    Get PDF
    In recent years the phylogenetic relationship of mammalian orders has been addressed in a number of molecular studies. These analyses have frequently yielded inconsistent results with respect to some basal ordinal relationships. For example, the relative placement of primates, rodents, and carnivores has differed in various studies. Here, we attempt to resolve this phylogenetic problem by using data from completely sequenced nuclear genomes to base the analyses on the largest possible amount of data. To minimize the risk of reconstruction artifacts, the trees were reconstructed under different criteria—distance, parsimony, and likelihood. For the distance trees, distance metrics that measure independent phenomena (amino acid replacement, synonymous substitution, and gene reordering) were used, as it is highly improbable that all of the trees would be affected the same way by any reconstruction artifact. In contradiction to the currently favored classification, our results based on full-genome analysis of the phylogenetic relationship between human, dog, and mouse yielded overwhelming support for a primate–carnivore clade with the exclusion of rodents

    Applications of heuristic search on phylogeny reconstruction problems

    Get PDF
    Phylogenies or evolutionary trees for a given family of species show the evolutionary relationships between these species. The leaves denote the given species, the internal nodes denote their common ancestors and the edges denote the genetic relationships. Species can be identified by their whole genomes and the evolutionary relations between species can be measured by the number of rearrangement events (i.e. mutations) that transform one genome into another. One approach to infer phylogeny from genomic data is by solving median genome problems for three genomes, or the genome rearrangement problem for pairs of genomes, while trying to minimize the total evolutionary distance among the given species. In this thesis, we have developed and implemented two search based algorithms for phylogeny reconstruction problem based on solving median genome problems for circular genomes of the same length without gene duplication. In order to show applicability and effectiveness of our algorithms, we have tested them with randomly generated instances and two real data sets: mitochondrial genomes of Metazoa and chloroplast genomes of Campanulaceae
    • 

    corecore