7 research outputs found

    Estimating true evolutionary distances under the DCJ model

    Get PDF
    Motivation: Modern techniques can yield the ordering and strandedness of genes on each chromosome of a genome; such data already exists for hundreds of organisms. The evolutionary mechanisms through which the set of the genes of an organism is altered and reordered are of great interest to systematists, evolutionary biologists, comparative genomicists and biomedical researchers. Perhaps the most basic concept in this area is that of evolutionary distance between two genomes: under a given model of genomic evolution, how many events most likely took place to account for the difference between the two genomes

    Guided genome halving: provably optimal solutions provide good insights into the preduplication ancestral genome of Saccharomyces cerevisiae.

    Get PDF
    International audienceWe present theoretical and practical advances on the Guided Genome Halving problem, a combinatorial optimisation problem which aims at proposing ancestral configurations of extant genomes when one of them has undergone a whole genome duplication. We provide a lower bound on the optimal solution, devise a heuristic algorithm based on subgraph identification, and apply it to yeast gene order data. On some instances, the computation of the bound yields a proof that the obtained solutions are optimal. We analyse a set of optimal solutions, compare them with a manually curated standard ancestor, showing that on yeast data, results coming from different methodologies are largely convergent: the optimal solutions are distant of at most one rearrangement from the reference

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

    Phylogenetic Reconstruction from Complete Gene Orders of Whole Genomes

    No full text
    Rapidly increasing numbers of organisms have been completely sequenced and most of their genes identified; homologies among these genes are also getting established. It thus has become possible to represent whole genomes as ordered lists of gene identifiers and to study the evolution of these entities through computational means, in systematics as well as in comparative genomics. While dealing with rearrangements is nontrivial, the biggest stumbling block remains gene duplication and losses, leading to considerable difficulties in determining orthologs among gene families—all the more since orthology determination has a direct impact on the selection of rearrangements. None of the existing phylogenetic reconstruction methods that use gene orders is able to exploit the information present in complete gene families—most assume singleton families and equal gene content, limiting the evolutionary operations to rearrangements, while others make it so by eliminating nonshared genes and selecting one exemplar from each gene family. In this work, we leverage our past work on genomic distances, on tight bounding of parsimony scores through linear programming, and on divide-and-conquer methods for large-scale reconstruction to build the first computational approach to phylogenetic reconstruction from complete gene order data, taking into account not only rearrangements, but also duplication an

    PHYLOGENETIC RECONSTRUCTION FROM COMPLETE GENE ORDERS OF WHOLE GENOMES

    No full text
    corecore