15 research outputs found

    Comparative genomics: multiple genome rearrangement and efficient algorithm development

    Get PDF
    Multiple genome rearrangement by signed reversal is discussed: For a collection of genomes represented by signed permutations, reconstruct their evolutionary history by using signed reversals, i.e. find a bifurcating tree where sampled genomes are assigned to leaf nodes and ancestral genomes (i.e. signed permutations) are hypothesized at internal nodes such that the total reversal distance summed over all edges of the tree is minimized. It is equivalent to finding an optimal Steiner tree that connects the given genomes by signed reversal paths. The key for the problem is to reconstruct all optimal Steiner nodes/ancestral genomes.;The problem is NP-hard and can only be solved by efficient approximation algorithms. Various algorithms/programs have been designed to solve the problem, such as BPAnalysis, GRAPPA, grid search algorithm, MGR greedy split algorithm (Chapter 1). However, they may have expensive computational costs or low inference accuracy. In this thesis, several new algorithms are developed, including nearest path search algorithm (Chapter 2), neighbor-perturbing algorithm (Chapter 3), branch-and-bound algorithm (Chapter 3), perturbing-improving algorithm (Chapter 4), partitioning algorithm (Chapter 5), etc. With theoretical proofs, computer simulations, and biological applications, these algorithms are shown to be 2-approximation algorithms and more efficient than the existing algorithms

    Inversion-based genomic signatures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reconstructing complete ancestral genomes (at least in terms of their gene inventory and arrangement) is attracting much interest due to the rapidly increasing availability of whole genome sequences. While modest successes have been reported for mammalian and even vertebrate genomes, more divergent groups continue to pose a stiff challenge, mostly because current models of genomic evolution support too many choices.</p> <p>Results</p> <p>We describe a novel type of genomic signature based on rearrangements that characterizes evolutionary changes that must be common to all minimal rearrangement scenarios; by focusing on global patterns of rearrangements, such signatures bypass individual variations and sharply restrict the search space. We present the results of extensive simulation studies demonstrating that these signatures can be used to reconstruct accurate ancestral genomes and phylogenies even for widely divergent collections.</p> <p>Conclusion</p> <p>Focusing on genome triples rather than genomes pairs unleashes the full power of evolutionary analysis. Our genomic signature captures shared evolutionary events and thus can form the basis of a robust analysis and reconstruction of evolutionary history.</p

    Elucidating Genome Structure Evolution by Analysis of Isoapostatic Gene Clusters using Statistics of Variance of Gene Distances

    Get PDF
    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In related genomes, clusters of homologous gene pairs serve as evidence for candidate homologous regions, which make up genomic core. Previous studies on the structural organization of bacterial genomes revealed that basic backbone of genomic core is interrupted by genomic islands. Here, we applied statistics using variance of distances as a measure to classify conserved genes within a set of genomes according to their “isoapostatic” relationship, which keeps nearly identical distances of genes. The results of variance statistics analysis of cyanobacterial genomes including Prochlorococcus, Synechococcus, and Anabaena indicated that the conserved genes are classified into several groups called “virtual linkage groups (VLGs)” according to their positional conservation of orthologs over the genomes analyzed. The VLGs were used to define mosaic domain structure of the genomic core. The current model of mosaic genomic domains can explain global evolution of the genomic core of cyanobacteria. It also visualizes islands of lateral gene transfer. The stability and the robustness of the variance statistics are discussed. This method will also be useful in deciphering the structural organization of genomes in other groups of bacteria

    A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes.</p> <p>Results</p> <p>In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors.</p> <p>Conclusion</p> <p>The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p

    The Distance and Median Problems in the Single-Cut-Or-Join Model with Single-Gene Duplications

    Get PDF
    Background. In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem&nbsp;is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. Results. We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. Conclusion. Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances

    Algorithmic approaches for genome rearrangement: a review

    Full text link