410 research outputs found
Efficient Exploration of the Space of Reconciled Gene Trees
Gene trees record the combination of gene level events, such as duplication,
transfer and loss, and species level events, such as speciation and extinction.
Gene tree-species tree reconciliation methods model these processes by drawing
gene trees into the species tree using a series of gene and species level
events. The reconstruction of gene trees based on sequence alone almost always
involves choosing between statistically equivalent or weakly distinguishable
relationships that could be much better resolved based on a putative species
tree. To exploit this potential for accurate reconstruction of gene trees the
space of reconciled gene trees must be explored according to a joint model of
sequence evolution and gene tree-species tree reconciliation.
Here we present amalgamated likelihood estimation (ALE), a probabilistic
approach to exhaustively explore all reconciled gene trees that can be
amalgamated as a combination of clades observed in a sample of trees. We
implement ALE in the context of a reconciliation model, which allows for the
duplication, transfer and loss of genes. We use ALE to efficiently approximate
the sum of the joint likelihood over amalgamations and to find the reconciled
gene tree that maximizes the joint likelihood.
We demonstrate using simulations that gene trees reconstructed using the
joint likelihood are substantially more accurate than those reconstructed using
sequence alone. Using realistic topologies, branch lengths and alignment sizes,
we demonstrate that ALE produces more accurate gene trees even if the model of
sequence evolution is greatly simplified. Finally, examining 1099 gene families
from 36 cyanobacterial genomes we find that joint likelihood-based inference
results in a striking reduction in apparent phylogenetic discord, with 24%, 59%
and 46% percent reductions in the mean numbers of duplications, transfers and
losses.Comment: Manuscript accepted pending revision in Systematic Biolog
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs
Abstract Background Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters. Results We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required. Conclusion We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.</p
Genome-scale phylogenetic analysis finds extensive gene transfer among Fungi
Although the role of lateral gene transfer is well recognized in the
evolution of bacteria, it is generally assumed that it has had less influence
among eukaryotes. To explore this hypothesis we compare the dynamics of genome
evolution in two groups of organisms: Cyanobacteria and Fungi. Ancestral
genomes are inferred in both clades using two types of methods. First, Count, a
gene tree unaware method that models gene duplications, gains and losses to
explain the observed numbers of genes present in a genome. Second, ALE, a more
recent gene tree-aware method that reconciles gene trees with a species tree
using a model of gene duplication, loss, and transfer. We compare their merits
and their ability to quantify the role of transfers, and assess the impact of
taxonomic sampling on their inferences. We present what we believe is
compelling evidence that gene transfer plays a significant role in the
evolution of Fungi
Probabilistic Graphical Model Representation in Phylogenetics
Recent years have seen a rapid expansion of the model space explored in
statistical phylogenetics, emphasizing the need for new approaches to
statistical model representation and software development. Clear communication
and representation of the chosen model is crucial for: (1) reproducibility of
an analysis, (2) model development and (3) software design. Moreover, a
unified, clear and understandable framework for model representation lowers the
barrier for beginners and non-specialists to grasp complex phylogenetic models,
including their assumptions and parameter/variable dependencies.
Graphical modeling is a unifying framework that has gained in popularity in
the statistical literature in recent years. The core idea is to break complex
models into conditionally independent distributions. The strength lies in the
comprehensibility, flexibility, and adaptability of this formalism, and the
large body of computational work based on it. Graphical models are well-suited
to teach statistical models, to facilitate communication among phylogeneticists
and in the development of generic software for simulation and statistical
inference.
Here, we provide an introduction to graphical models for phylogeneticists and
extend the standard graphical model representation to the realm of
phylogenetics. We introduce a new graphical model component, tree plates, to
capture the changing structure of the subgraph corresponding to a phylogenetic
tree. We describe a range of phylogenetic models using the graphical model
framework and introduce modules to simplify the representation of standard
components in large and complex models. Phylogenetic model graphs can be
readily used in simulation, maximum likelihood inference, and Bayesian
inference using, for example, Metropolis-Hastings or Gibbs sampling of the
posterior distribution
Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of Bacteria
<p>Abstract</p> <p>Background</p> <p>Despite a large agreement between ribosomal RNA and concatenated protein phylogenies, the phylogenetic tree of the bacterial domain remains uncertain in its deepest nodes. For instance, the position of the hyperthermophilic Aquificales is debated, as their commonly observed position close to Thermotogales may proceed from horizontal gene transfers, long branch attraction or compositional biases, and may not represent vertical descent. Indeed, another view, based on the analysis of rare genomic changes, places Aquificales close to epsilon-Proteobacteria.</p> <p>Results</p> <p>To get a whole genome view of <it>Aquifex </it>relationships, all trees containing sequences from <it>Aquifex </it>in the HOGENOM database were surveyed. This study revealed that <it>Aquifex </it>is most often found as a neighbour to Thermotogales. Moreover, informational genes, which appeared to be less often transferred to the <it>Aquifex </it>lineage than non-informational genes, most often placed Aquificales close to Thermotogales. To ensure these results did not come from long branch attraction or compositional artefacts, a subset of carefully chosen proteins from a wide range of bacterial species was selected for further scrutiny. Among these genes, two phylogenetic hypotheses were found to be significantly more likely than the others: the most likely hypothesis placed Aquificales as a neighbour to Thermotogales, and the second one with epsilon-Proteobacteria. We characterized the genes that supported each of these two hypotheses, and found that differences in rates of evolution or in amino-acid compositions could not explain the presence of two incongruent phylogenetic signals in the alignment. Instead, evidence for a large Horizontal Gene Transfer between Aquificales and epsilon-Proteobacteria was found.</p> <p>Conclusion</p> <p>Methods based on concatenated informational proteins and methods based on character cladistics led to different conclusions regarding the position of Aquificales because this lineage has undergone many horizontal gene transfers. However, if a tree of vertical descent can be reconstructed for Bacteria, our results suggest Aquificales should be placed close to Thermotogales.</p
RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language.
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]
The inference of gene trees with species trees.
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution
Nonadaptive evolution of mitochondrial genome size
Genomes vary greatly in size and complexity, and identifying the evolutionary forces that have generated this variation remains a major goal in biology. A controversial proposal is that most changes in genome size are initially deleterious and therefore are linked to episodes of decrease in effective population sizes. Support for this hypothesis comes from large-scale comparative analyses, but vanishes when phylogenetic nonindependence is taken into account. Another approach to test this hypothesis involves analyzing sequence evolution among clades where duplications have recently fixed. Here we show that episodes of fixation of duplications in mitochondrial genomes of the gecko Heteronotia binoei (two independent clades) and of mantellid frogs (five distinct branches) coincide with reductions in the ability of selection to purge slightly deleterious mutations. Our results support the idea that genome complexity can arise through nonadaptive processes in tetrapods. © 2011 The Author(s). Evolution © 2011 The Society for the Study of Evolution
Recommended from our members
A Phylogenomic Approach to Vertebrate Phylogeny Supports a Turtle-Archosaur Affinity and a Possible Paraphyletic Lissamphibia
In resolving the vertebrate tree of life, two fundamental questions remain: 1) what is the phylogenetic position of turtles within amniotes, and 2) what are the relationships between the three major lissamphibian (extant amphibian) groups? These relationships have historically been difficult to resolve, with five different hypotheses proposed for turtle placement, and four proposed branching patterns within Lissamphibia. We compiled a large cDNA/EST dataset for vertebrates (75 genes for 129 taxa) to address these outstanding questions. Gene-specific phylogenetic analyses revealed a great deal of variation in preferred topology, resulting in topologically ambiguous conclusions from the combined dataset. Due to consistent preferences for the same divergent topologies across genes, we suspected systematic phylogenetic error as a cause of some variation. Accordingly, we developed and tested a novel statistical method that identifies sites that have a high probability of containing biased signal for a specific phylogenetic relationship. After removing putatively biased sites, support emerged for a sister relationship between turtles and either crocodilians or archosaurs, as well as for a caecilian-salamander sister relationship within Lissamphibia, with Lissamphibia potentially paraphyletic.Organismic and Evolutionary Biolog
- …