15 research outputs found
Comparative genomics: multiple genome rearrangement and efficient algorithm development
Multiple genome rearrangement by signed reversal is discussed: For a collection of genomes represented by signed permutations, reconstruct their evolutionary history by using signed reversals, i.e. find a bifurcating tree where sampled genomes are assigned to leaf nodes and ancestral genomes (i.e. signed permutations) are hypothesized at internal nodes such that the total reversal distance summed over all edges of the tree is minimized. It is equivalent to finding an optimal Steiner tree that connects the given genomes by signed reversal paths. The key for the problem is to reconstruct all optimal Steiner nodes/ancestral genomes.;The problem is NP-hard and can only be solved by efficient approximation algorithms. Various algorithms/programs have been designed to solve the problem, such as BPAnalysis, GRAPPA, grid search algorithm, MGR greedy split algorithm (Chapter 1). However, they may have expensive computational costs or low inference accuracy. In this thesis, several new algorithms are developed, including nearest path search algorithm (Chapter 2), neighbor-perturbing algorithm (Chapter 3), branch-and-bound algorithm (Chapter 3), perturbing-improving algorithm (Chapter 4), partitioning algorithm (Chapter 5), etc. With theoretical proofs, computer simulations, and biological applications, these algorithms are shown to be 2-approximation algorithms and more efficient than the existing algorithms
Inversion-based genomic signatures
<p>Abstract</p> <p>Background</p> <p>Reconstructing complete ancestral genomes (at least in terms of their gene inventory and arrangement) is attracting much interest due to the rapidly increasing availability of whole genome sequences. While modest successes have been reported for mammalian and even vertebrate genomes, more divergent groups continue to pose a stiff challenge, mostly because current models of genomic evolution support too many choices.</p> <p>Results</p> <p>We describe a novel type of genomic signature based on rearrangements that characterizes evolutionary changes that must be common to all minimal rearrangement scenarios; by focusing on global patterns of rearrangements, such signatures bypass individual variations and sharply restrict the search space. We present the results of extensive simulation studies demonstrating that these signatures can be used to reconstruct accurate ancestral genomes and phylogenies even for widely divergent collections.</p> <p>Conclusion</p> <p>Focusing on genome triples rather than genomes pairs unleashes the full power of evolutionary analysis. Our genomic signature captures shared evolutionary events and thus can form the basis of a robust analysis and reconstruction of evolutionary history.</p
Elucidating Genome Structure Evolution by Analysis of Isoapostatic Gene Clusters using Statistics of Variance of Gene Distances
Identifying genomic regions that descended from a common ancestor is important
for understanding the function and evolution of genomes. In related genomes,
clusters of homologous gene pairs serve as evidence for candidate homologous
regions, which make up genomic core. Previous studies on the structural
organization of bacterial genomes revealed that basic backbone of genomic core
is interrupted by genomic islands. Here, we applied statistics using variance of
distances as a measure to classify conserved genes within a set of genomes
according to their “isoapostatic” relationship, which keeps
nearly identical distances of genes. The results of variance statistics analysis
of cyanobacterial genomes including Prochlorococcus,
Synechococcus, and Anabaena indicated that
the conserved genes are classified into several groups called “virtual
linkage groups (VLGs)” according to their positional conservation of
orthologs over the genomes analyzed. The VLGs were used to define mosaic domain
structure of the genomic core. The current model of mosaic genomic domains can
explain global evolution of the genomic core of cyanobacteria. It also
visualizes islands of lateral gene transfer. The stability and the robustness of
the variance statistics are discussed. This method will also be useful in
deciphering the structural organization of genomes in other groups of
bacteria
A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions
<p>Abstract</p> <p>Background</p> <p>Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes.</p> <p>Results</p> <p>In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors.</p> <p>Conclusion</p> <p>The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p
The Distance and Median Problems in the Single-Cut-Or-Join Model with Single-Gene Duplications
Background.
In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model.
Results.
We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data.
Conclusion.
Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances