931 research outputs found
Efficient Exploration of the Space of Reconciled Gene Trees
Gene trees record the combination of gene level events, such as duplication,
transfer and loss, and species level events, such as speciation and extinction.
Gene tree-species tree reconciliation methods model these processes by drawing
gene trees into the species tree using a series of gene and species level
events. The reconstruction of gene trees based on sequence alone almost always
involves choosing between statistically equivalent or weakly distinguishable
relationships that could be much better resolved based on a putative species
tree. To exploit this potential for accurate reconstruction of gene trees the
space of reconciled gene trees must be explored according to a joint model of
sequence evolution and gene tree-species tree reconciliation.
Here we present amalgamated likelihood estimation (ALE), a probabilistic
approach to exhaustively explore all reconciled gene trees that can be
amalgamated as a combination of clades observed in a sample of trees. We
implement ALE in the context of a reconciliation model, which allows for the
duplication, transfer and loss of genes. We use ALE to efficiently approximate
the sum of the joint likelihood over amalgamations and to find the reconciled
gene tree that maximizes the joint likelihood.
We demonstrate using simulations that gene trees reconstructed using the
joint likelihood are substantially more accurate than those reconstructed using
sequence alone. Using realistic topologies, branch lengths and alignment sizes,
we demonstrate that ALE produces more accurate gene trees even if the model of
sequence evolution is greatly simplified. Finally, examining 1099 gene families
from 36 cyanobacterial genomes we find that joint likelihood-based inference
results in a striking reduction in apparent phylogenetic discord, with 24%, 59%
and 46% percent reductions in the mean numbers of duplications, transfers and
losses.Comment: Manuscript accepted pending revision in Systematic Biolog
A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction
Supplementary tables S1, sections 2.1–2.3, and figures S1–S11 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2–3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.National Science Foundation (U.S.) (CAREER award NSF 0644282
Evolution through segmental duplications and losses : A Super-Reconciliation approach
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes
Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny
Supplementary sections 1–13, tables S1–S10, and figures S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER award NSF 0644282
Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss
Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234
Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus
International audienceReconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements , and protein divergence into a single evolutionary framework
Unifying Parsimonious Tree Reconciliation
Evolution is a process that is influenced by various environmental factors,
e.g. the interactions between different species, genes, and biogeographical
properties. Hence, it is interesting to study the combined evolutionary history
of multiple species, their genes, and the environment they live in. A common
approach to address this research problem is to describe each individual
evolution as a phylogenetic tree and construct a tree reconciliation which is
parsimonious with respect to a given event model. Unfortunately, most of the
previous approaches are designed only either for host-parasite systems, for
gene tree/species tree reconciliation, or biogeography. Hence, a method is
desirable, which addresses the general problem of mapping phylogenetic trees
and covering all varieties of coevolving systems, including e.g., predator-prey
and symbiotic relationships. To overcome this gap, we introduce a generalized
cophylogenetic event model considering the combinatorial complete set of local
coevolutionary events. We give a dynamic programming based heuristic for
solving the maximum parsimony reconciliation problem in time O(n^2), for two
phylogenies each with at most n leaves. Furthermore, we present an exact
branch-and-bound algorithm which uses the results from the dynamic programming
heuristic for discarding partial reconciliations. The approach has been
implemented as a Java application which is freely available from
http://pacosy.informatik.uni-leipzig.de/coresym.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss
Motivation: Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction
- …