23 research outputs found

    Comparative genomics: multiple genome rearrangement and efficient algorithm development

    Get PDF
    Multiple genome rearrangement by signed reversal is discussed: For a collection of genomes represented by signed permutations, reconstruct their evolutionary history by using signed reversals, i.e. find a bifurcating tree where sampled genomes are assigned to leaf nodes and ancestral genomes (i.e. signed permutations) are hypothesized at internal nodes such that the total reversal distance summed over all edges of the tree is minimized. It is equivalent to finding an optimal Steiner tree that connects the given genomes by signed reversal paths. The key for the problem is to reconstruct all optimal Steiner nodes/ancestral genomes.;The problem is NP-hard and can only be solved by efficient approximation algorithms. Various algorithms/programs have been designed to solve the problem, such as BPAnalysis, GRAPPA, grid search algorithm, MGR greedy split algorithm (Chapter 1). However, they may have expensive computational costs or low inference accuracy. In this thesis, several new algorithms are developed, including nearest path search algorithm (Chapter 2), neighbor-perturbing algorithm (Chapter 3), branch-and-bound algorithm (Chapter 3), perturbing-improving algorithm (Chapter 4), partitioning algorithm (Chapter 5), etc. With theoretical proofs, computer simulations, and biological applications, these algorithms are shown to be 2-approximation algorithms and more efficient than the existing algorithms

    A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes.</p> <p>Results</p> <p>In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors.</p> <p>Conclusion</p> <p>The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p

    Genomic distances under deletions and insertions

    Get PDF
    As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions (or conversely, a limited form of insertions which forbids duplications). In this paper, we extend El-Mabrouk's work to handle duplications as well as insertions and present an alternate framework for computing (near) minimal edit sequences involving insertions, deletions, and inversions. We derive an error bound for our polynomial-time distance computation under various assumptions and present preliminary experimental results that suggest that performance in practice may be excellent, within a few percent of the actual distance

    A Unifying Model of Genome Evolution Under Parsimony

    Get PDF
    We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph GG, a finite set of AVGs describe all parsimonious interpretations of GG, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

    A cubic algorithm for the generalized rank median of three genomes

    Get PDF
    The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time O(n omega), where omega <= 3, with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.ResultsWe define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes.ConclusionsWe test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction14FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESP2016/01511-

    Inferring genome-scale rearrangement phylogeny and ancestral gene order: a Drosophila case study

    Get PDF
    A simple, fast, and biologically-inspired computational approach to infer genome-scale rearrangement phylogeny and ancestral gene order has been developed and applied to eight Drosophila genomes, providing insights into evolutionary chromosomal dynamics

    Steps toward accurate reconstructions of phylogenies from gene-order data

    Get PDF
    We report on our progress in reconstructing phylogenies from gene-order data. We have developed polynomial-time methods for estimating genomic distances that greatly improve the accuracy of trees obtained using the popular neighbor-joining method; we have also further improved the running time of our GRAPPA software suite through a combination of tighter bounding and better use of the bounds. We present new experimental results (that extend those we presented at ISMB’01 and WABI’01) that demonstrate the accuracy and robustness of our distance estimators under a wide range of model conditions. Moreover, using the best of our distance estimators (EDE) in our GRAPPA software suite, along with more sophisticated bounding techniques, produced spectacular improvements in the already huge speedup: whereas our earlier experiments showed a one-million-fold speedup (when run on a 512-processor cluster), our latest experiments demonstrate a speedup of one hundred million. The combination of these various advances enabled us to conduct new phylogenetic analyses of a subset of the Campanulaceae family, confirming various conjectures about the relationships among members of the subset and confirming that inversion can be viewed as the principal mechanism of evolution for their chloroplast genome. We give representative results of the extensive experimentation we conducted on both real and simulated datasets in order to validate and characterize our approaches

    A Hierarchical Framework for Phylogenetic and Ancestral Genome Reconstruction on Whole Genome Data

    Get PDF
    Gene order gets evolved under events such as rearrangements, duplications, and losses, which can change both the order and content along the genome, through the long history of genome evolution. Recently, the accumulation of genomic sequences provides researchers with the chance to handle long-standing problems about the phylogenies, or evolutionary histories, of sets of species, and ancestral genomic content and orders. Over the past few years, such problems have been proven so interesting that a large number of algorithms have been proposed in the attempt to resolve them, following different standards. The work presented in this dissertation focuses on algorithms and models for whole-genome evolution and their applications in phylogeny and ancestor inference from gene order. We developed a flexible ancestor reconstruction method (FARM) within the framework of maximum likelihood and weighted maximum matching. We designed binary code based framework to reconstruct evolutionary history for whole genome gene orders. We developed algorithms to estimate/predict missing adjacencies in ancestral reconstruction procedure to restore gene order from species, when leaf genomes are far from each other. We developed a pipeline involving maximum likelihood, weighted maximum matching and variable length binary encoding for estimation of ancestral gene content, to reconstruct ancestral genomes under the various evolutionary model, including genome rearrangements, additions, losses and duplications, with high accuracy and low time consumption. Phylogenetic analyses of whole-genome data have been limited to small collections of genomes and low-resolution data, or data without massive duplications. We designed a maximum-likelihood approach to phylogeny analysis (VLWD) based on variable length binary encoding, under maximum likelihood model, to reconstruct phylogenies from whole genome data, scaling up in accuracy and make it capable of reconstructing phylogeny from whole genome data, like triploids and tetraploids. Maximum likelihood based approaches have been applied to ancestral reconstruction but remain primitive for whole-genome data. We developed a hierarchical framework for ancestral reconstruction, using variable length binary encoding in content estimation, then adjacencies fixing and missing adjacencies predicting in adjacencies collection and finally, weighted maximum matching in gene order assembly. Therefore it extensively improves the performance of ancestral gene order reconstruction. We designed a series of experiments to validate these methods and compared the results with the most recent and comparable methods. According to the results, they are proven to be fast and accurate