Search CORE

Infoscience - École polytechnique fédérale de Lausanne

Inversion-based genomic signatures

Author: Moret Bernard ME
Swenson Krister M
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Reconstructing complete ancestral genomes (at least in terms of their gene inventory and arrangement) is attracting much interest due to the rapidly increasing availability of whole genome sequences. While modest successes have been reported for mammalian and even vertebrate genomes, more divergent groups continue to pose a stiff challenge, mostly because current models of genomic evolution support too many choices. Results We describe a novel type of genomic signature based on rearrangements that characterizes evolutionary changes that must be common to all minimal rearrangement scenarios; by focusing on global patterns of rearrangements, such signatures bypass individual variations and sharply restrict the search space. We present the results of extensive simulation studies demonstrating that these signatures can be used to reconstruct accurate ancestral genomes and phylogenies even for widely divergent collections. Conclusion Focusing on genome triples rather than genomes pairs unleashes the full power of evolutionary analysis. Our genomic signature captures shared evolutionary events and thus can form the basis of a robust analysis and reconstruction of evolutionary history.</p

CiteSeerX

Gene rearrangement analysis and ancestral order inference from chloroplast genomes with inverted repeat

Author: Cui Liying
dePamphilis Claude W
Moret Bernard ME
Tang Jijun
Yue Feng
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background Genome evolution is shaped not only by nucleotide substitutions, but also by structural changes including gene and genome duplications, insertions, deletions and gene order rearrangements. The most popular methods for reconstructing phylogeny from genome rearrangements include GRAPPA and MGR. However these methods are limited to cases where equal gene content or few deletions can be assumed. Since conserved duplicated regions are present in many chloroplast genomes, the inference of inverted repeats is needed in chloroplast phylogeny analysis and ancestral genome reconstruction. Results We extend GRAPPA and develop a new method GRAPPA-IR to handle chloroplast genomes. A test of GRAPPA-IR using divergent chloroplast genomes from land plants and green algae recovers the phylogeny congruent with prior studies, while analysis that do not consider IR structure fail to obtain the accepted topology. Our extensive simulation study also confirms that GRAPPA has better accuracy then the existing methods. Conclusions Tests on a biological and simulated dataset show GRAPPA-IR can accurately recover the genome phylogeny as well as ancestral gene orders. Close analysis of the ancestral genome structure suggests that genome rearrangement in chloroplasts is probably limited by inverted repeats with a conserved core region. In addition, the boundaries of inverted repeats are hot spots for gene duplications or deletions. The new GRAPPA-IR is available from http://phylo.cse.sc.ed

Scholar Commons - Institutional Repository of the University of South Carolina

Heuristics for the inversion median problem

Author: A Bergeron
A Caprara
A Siepel
A Siepel
A Sturtevant
A Sturtevant
A Xu
A Xu
Andrew Wei Xu
B Moret
B Moret
Bernard ME Moret
D Bader
E Tannier
G Bourque
G Fertin
J Palmer
K Swenson
Krister M Swenson
M Kothari
Vaibhav Rajan
W Day
Yu Lin
Publication venue: BioMed Central
Publication date: 14/10/2009
Field of study

Background: The study of genome rearrangements has become a mainstay of phylogenetics and comparative genomics. Fundamental in such a study is the median problem: given three gene arrangements, find a fourth that minimizes the sum of the evolutionary distances between itself and the given three. Many exact algorithms and heuristics have been developped for the inversion median problem, of which the best known is MGR. Results: We present a unifying framework for median heuristics, which enables us to clarify existing strategies and to place them in a partial ordering. Analysis of this framework leads to a new insight: the best strategies continue to refer to the input data rather than just to updated estimates. Using this insight, we develop a new heuristic for inversion medians that uses input data to the end of its computation and leverages our previous work with DCJ medians. Finally, we present the results of extensive experimentation showing that our new heuristic outperforms all others in accuracy and, especially, in running time: the heuristic typically returns solutions within 1 % of optimal and runs in seconds to minutes even on genomes with 25’000 genes—in contrast, MGR can take days on instances of 200 genes and cannot be used beyond 1’000 genes. Conclusions: Finding good rearrangement medians, in particular inversion medians, had long been regarded as the computational bottleneck in whole-genome studies. Our new heuristic for inversion medians, ASM, which dominates all others in our framework, puts that issue to rest by providing near-optimal solutions within seconds to minutes on even the largest genomes

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Infoscience - École polytechnique fédérale de Lausanne

Estimating true evolutionary distances under rearrangements, duplications, and losses

Author: A Bergeron
A Ouangraoua
A Rokas
B Moret
Bernard ME Moret
D Sankoff
D Swofford
G Fertin
K Swenson
Krister M Swenson
LS Wang
LS Wang
N El-Mabrouk
S Yancopoulos
S Yancopoulos
Vaibhav Rajan
Y Lin
Yu Lin
Publication venue: BioMed Central
Publication date: 14/10/2009
Field of study

Background: The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes). Results: To handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator. Conclusions: We give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude

CiteSeerX

Infoscience - École polytechnique fédérale de Lausanne

Refining transcriptional regulatory networks using network evolutionary models and gene histories

Author: A Bhan
A Crombach
A Stark
A Tanay
AL Barabási
Bernard ME Moret
BME Moret
C Roth
CT Harbison
D Durand
DM Hillis
G Bourque
J Kim
J Yu
KP Murphy
L Arvestad
M Kanehisa
MM Babu
MM Babu
N Friedman
N Friedman
R Wang
RDM Page
S Liang
SA Teichmann
SY Kim
T Akutsu
T Chen
T Pupko
X Zhang
X Zhang
Xiuwei Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Computational inference of transcriptional regulatory networks remains a challenging problem, in part due to the lack of strong network models. In this paper we present evolutionary approaches to improve the inference of regulatory networks for a family of organisms by developing an evolutionary model for these networks and taking advantage of established phylogenetic relationships among these organisms. In previous work, we used a simple evolutionary model and provided extensive simulation results showing that phylogenetic information, combined with such a model, could be used to gain significant improvements on the performance of current inference algorithms. Results In this paper, we extend the evolutionary model so as to take into account gene duplications and losses, which are viewed as major drivers in the evolution of regulatory networks. We show how to adapt our evolutionary approach to this new model and provide detailed simulation results, which show significant improvement on the reference network inference algorithms. Different evolutionary histories for gene duplications and losses are studied, showing that our adapted approach is feasible under a broad range of conditions. We also provide results on biological data (<it>cis</it>-regulatory modules for 12 species of <it>Drosophila</it>), confirming our simulation results.</p

Infoscience - École polytechnique fédérale de Lausanne

Bootstrapping phylogenies inferred from rearrangement data

Author: Lin Yu
Moret Bernard ME
Rajan Vaibhav
Publication venue: BMC
Publication date: 01/01/2012
Field of study

Abstract Background Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. Results We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Conclusions Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.</p