8,733 research outputs found
Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss
Motivation: Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction
The shape of human gene family phylogenies
BACKGROUND: The shape of phylogenetic trees has been used to make inferences about the evolutionary process by comparing the shapes of actual phylogenies with those expected under simple models of the speciation process. Previous studies have focused on speciation events, but gene duplication is another lineage splitting event, analogous to speciation, and gene loss or deletion is analogous to extinction. Measures of the shape of gene family phylogenies can thus be used to investigate the processes of gene duplication and loss. We make the first systematic attempt to use tree shape to study gene duplication using human gene phylogenies. RESULTS: We find that gene duplication has produced gene family trees significantly less balanced than expected from a simple model of the process, and less balanced than species phylogenies: the opposite to what might be expected under the 2R hypothesis. CONCLUSION: While other explanations are plausible, we suggest that the greater imbalance of gene family trees than species trees is due to the prevalence of tandem duplications over regional duplications during the evolution of the human genome
Vertebrate phylogenomics and gene family evolution
This thesis is about 2 topics; the evolution of gene families by the birth-death process of gene duplication and gene loss, and phylogenetic inference. It is a central theme that these two processes are intimately associated - the phylogenies of gene families (of any gene) are shaped by the processes of gene duplication and gene loss, as much as by the processes of speciation and extinction occurring among the species the gene is evolving in. This has two results. Firstly, that we need to know, or assume, something about the processes of gene duplication and loss to correctly understand the pattern of speciation, or cladogenesis, in a group of organisms. Secondly, that we need to know, or assume, something about this pattern if we are to fully appreciate the effect of gene duplication and loss on a gene family phylogeny.The main part of this thesis investigates the use of reconciled tree methods in unravelling species phylogeny and the evolution of gene families. Part of this investigation involves placing reconciled tree methods (and the use of these methods to infer species phylogeny, known as gene tree parsimony), in the context of some related methods: supertree methods and "simultaneous analysis" of combined data. Two empirical studies complete this part of the thesis - one attempting to infer the higher-level phylogeny of vertebrates using gene tree parsimony, and another focusing on a lower taxonomic level, on primate phylogeny. This chapter attempts an integrated study of gene duplication and species phylogeny, which uses information about gene duplication to help date evolutionary events.Despite the close relationship between gene duplication and speciation on phylogenies, it is possible to study gene duplication independently. If we restrict ourselves to genes sampled from a single genome, gene family trees represent gene duplications and gene losses occurring during the history of a single species, so the complication of speciation and extinction is eliminated. By realising that the processes of gene duplication and loss in these trees are analogous to the processes of speciation and extinction in species phylogenies, we can harness a toolkit of methods developed for more traditional phylogenies to study these molecular processes. Two such methods are models of cladistic tree shape and birth-death models, which allow the first estimates of the rate of gene loss
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Efficient Exploration of the Space of Reconciled Gene Trees
Gene trees record the combination of gene level events, such as duplication,
transfer and loss, and species level events, such as speciation and extinction.
Gene tree-species tree reconciliation methods model these processes by drawing
gene trees into the species tree using a series of gene and species level
events. The reconstruction of gene trees based on sequence alone almost always
involves choosing between statistically equivalent or weakly distinguishable
relationships that could be much better resolved based on a putative species
tree. To exploit this potential for accurate reconstruction of gene trees the
space of reconciled gene trees must be explored according to a joint model of
sequence evolution and gene tree-species tree reconciliation.
Here we present amalgamated likelihood estimation (ALE), a probabilistic
approach to exhaustively explore all reconciled gene trees that can be
amalgamated as a combination of clades observed in a sample of trees. We
implement ALE in the context of a reconciliation model, which allows for the
duplication, transfer and loss of genes. We use ALE to efficiently approximate
the sum of the joint likelihood over amalgamations and to find the reconciled
gene tree that maximizes the joint likelihood.
We demonstrate using simulations that gene trees reconstructed using the
joint likelihood are substantially more accurate than those reconstructed using
sequence alone. Using realistic topologies, branch lengths and alignment sizes,
we demonstrate that ALE produces more accurate gene trees even if the model of
sequence evolution is greatly simplified. Finally, examining 1099 gene families
from 36 cyanobacterial genomes we find that joint likelihood-based inference
results in a striking reduction in apparent phylogenetic discord, with 24%, 59%
and 46% percent reductions in the mean numbers of duplications, transfers and
losses.Comment: Manuscript accepted pending revision in Systematic Biolog
- âŠ