543 research outputs found

    Reconstructing the Genomic Architecture of Mammalian Ancestors Using Multispecies Comparative Maps

    Get PDF
    Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes

    Balanced Vertices in Trees and a Simpler Algorithm to Compute the Genomic Distance

    Get PDF
    This paper provides a short and transparent solution for the covering cost of white-grey trees which play a crucial role in the algorithm of Bergeron {\it et al.}\ to compute the rearrangement distance between two multichromosomal genomes in linear time ({\it Theor. Comput. Sci.}, 410:5300-5316, 2009). In the process it introduces a new {\em center} notion for trees, which seems to be interesting on its own.Comment: 6 pages, submitte

    The Tandem Duplication Distance Is NP-Hard

    Get PDF
    In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ? AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems

    Approximating the double-cut-and-join distance between unsigned genomes

    Get PDF
    In this paper we study the problem of sorting unsigned genomes by double-cut-and-join operations, where genomes allow a mix of linear and circular chromosomes to be present. First, we formulate an equivalent optimization problem, called maximum cycle/path decomposition, which is aimed at finding a largest collection of edge-disjoint cycles/AA-paths/AB-paths in a breakpoint graph. Then, we show that the problem of finding a largest collection of edge-disjoint cycles/AA-paths/AB-paths of length no more than l can be reduced to the well-known degree-bounded k-set packing problem with k = 2l. Finally, a polynomial-time approximation algorithm for the problem of sorting unsigned genomes by double-cut-and-join operations is devised, which achieves the approximation ratio for any positive Δ. For the restricted variation where each genome contains only one linear chromosome, the approximation ratio can be further improved t

    Multichromosomal median and halving problems under different genomic distances

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances.</p> <p>Results</p> <p>We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems.</p> <p>Conclusion</p> <p>This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes.</p

    Sobre modelos de rearranjo de genomas

    Get PDF
    Orientador: JoĂŁo MeidanisTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Rearranjo de genomas Ă© o nome dado a eventos onde grandes blocos de DNA trocam de posição durante o processo evolutivo. Com a crescente disponibilidade de sequĂȘncias completas de DNA, a anĂĄlise desse tipo de eventos pode ser uma importante ferramenta para o entendimento da genĂŽmica evolutiva. VĂĄrios modelos matemĂĄticos de rearranjo de genomas foram propostos ao longo dos Ășltimos vinte anos. Nesta tese, desenvolvemos dois novos modelos. O primeiro foi proposto como uma definição alternativa ao conceito de distĂąncia de breakpoint. Essa distĂąncia Ă© uma das mais simples medidas de rearranjo, mas ainda nĂŁo hĂĄ um consenso quanto Ă  sua definição para o caso de genomas multi-cromossomais. Pevzner e Tesler deram uma definição em 2003 e Tannier et al. a definiram de forma diferente em 2008. Nesta tese, nĂłs desenvolvemos uma outra alternativa, chamada de single-cut-or-join (SCJ). NĂłs mostramos que, no modelo SCJ, alĂ©m da distĂąncia, vĂĄrios problemas clĂĄssicos de rearranjo, como a mediana de rearranjo, genome halving e pequena parcimĂŽnia sĂŁo fĂĄceis, e apresentamos algoritmos polinomiais para eles. O segundo modelo que apresentamos Ă© o formalismo algĂ©brico por adjacĂȘncias, uma extensĂŁo do formalismo algĂ©brico proposto por Meidanis e Dias, que permite a modelagem de cromossomos lineares. Esta era a principal limitação do formalismo original, que sĂł tratava de cromossomos circulares. Apresentamos algoritmos polinomiais para o cĂĄlculo da distĂąncia algĂ©brica e tambĂ©m para encontrar cenĂĄrios de rearranjo entre dois genomas. TambĂ©m mostramos como calcular a distĂąncia algĂ©brica atravĂ©s do grafo de adjacĂȘncias, para facilitar a comparação com outras distĂąncias de rearranjo. Por fim, mostramos como modelar todas as operaçÔes clĂĄssicas de rearranjo de genomas utilizando o formalismo algĂ©bricoAbstract: Genome rearrangements are events where large blocks of DNA exchange places during evolution. With the growing availability of whole genome data, the analysis of these events can be a very important and promising tool for understanding evolutionary genomics. Several mathematical models of genome rearrangement have been proposed in the last 20 years. In this thesis, we propose two new rearrangement models. The first was introduced as an alternative definition of the breakpoint distance. The breakpoint distance is one of the most straightforward genome comparison measures, but when it comes to defining it precisely for multichromosomal genomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, and Tannier et al. defined it differently in 2008. In this thesis we provide yet another alternative, calling it single-cut-or-join (SCJ). We show that several genome rearrangement problems, such as genome median, genome halving and small parsimony, become easy for SCJ, and provide polynomial time algorithms for them. The second model we introduce is the Adjacency Algebraic Theory, an extension of the Algebraic Formalism proposed by Meidanis and Dias that allows the modeling of linear chromosomes, the main limitation of the original formalism, which could deal with circular chromosomes only. We believe that the algebraic formalism is an interesting alternative for solving rearrangement problems, with a different perspective that could complement the more commonly used combinatorial graph-theoretic approach. We present polynomial time algorithms to compute the algebraic distance and find rearrangement scenarios between two genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. Finally, we show how all classic rearrangement operations can be modeled using the algebraic theoryDoutoradoCiĂȘncia da ComputaçãoDoutor em CiĂȘncia da Computaçã

    A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance

    Get PDF
    • 

    corecore