Search CORE

220 research outputs found

The inference of gene trees with species trees

Author: Bastien Boussau
Eric Tannier
Gergely J. Szöllősi
Montbonnot France
Vincent Daubin
Publication venue
Publication date: 04/11/2013
Field of study

Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

PubMed Central

HAL

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)

Hal-Diderot

The inference of gene trees with species trees.

Author: Boussau Bastien
Daubin Vincent
Szöllősi Gergely J
Tannier Eric
Publication venue
Publication date: 01/01/2015
Field of study

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution

Repository of the Academy's Library

The human phylome

Author: A Meyer
A Rokas
A Rokas
AC Berglund-Sonnhammer
AM Aguinaldo
C Roth
C Seoighe
C Vogel
CG Kurland
CG Kurland
CM Zmasek
CM Zmasek
D Penny
DT Jones
ES Lander
EV Koonin
F Delsuc
F Ronquist
FD Ciccarelli
G Panopoulou
G Ricard
H Akaike
H Dopazo
H Philippe
Hernán Dopazo
I Humphery-Smith
J Adachi
J Nielsen
J Zhang
JA Bailey
JA Eisen
Jaime Huerta-Cepas
JC Chiu
JC Venter
JD McPherson
JE Blair
JO Andersson
Joaquín Dopazo
JW Thomas
K Misawa
L Arvestad
L Bromham
L Duret
L Li
M Hallet
M Kullberg
M Pruess
MA Huynen
MA Huynen
MR Goldsmith
N Alvarez
NW Blackstone
O Gascuel
O Jeffroy
PJ Keeling
PS Dehal
RC Edgar
RL Tatusov
S Guindon
S Henikoff
S Ohno
S Whelan
SA Benner
SE Fisher
SL Salzberg
T Blomme
T Cavalier-Smith
T Dagan
T Gabaldón
T Gabaldón
T Gabaldón
T Gabaldón
T Gabaldón
T Hulsen
T Müller
T Ohta
T Sicheritz-Ponten
TF Smith
TK Gandhi
TM Keane
Toni Gabaldón
TR Buckley
U Bergthorsson
V van Noort
WJ Bruno
WJ Murphy
WM Fitch
Y Suzuki
YI Wolf
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

The human phylome, which includes evolutionary relationships of all human proteins and their homologs among thirty-nine fully sequenced eukaryotes, is reconstructed

CiteSeerX

Crossref

PubMed Central

Evolution through segmental duplications and losses : A Super-Reconciliation approach

Author: A Bergeron
A Deepak
A Tofigh
AA Abbasi
AV Aho
B Moret
B Vernot
C Chauve
C Semple
CM Zmasek
CW Stevens
David Sankoff
DEK Ferrier
E Tannier
G Bourque
G Brightwell
G Fertin
G Pruesse
G Sundstrom
GJ Szöllősi
I Holyer
J Garcia-Fernàndez
J Ma
J Paszek
J Sjöstrand
JD Thompson
JP Doyon
LX Zhang
M Constantinescu
M Goodman
M Hafeez
M Lafond
MP Ng
MS Bansal
MS Bansal
N El-Mabrouk
O Akerborg
R Chaudhary
R Dondi
S Bérard
S Dreborg
S Kumar
TA Larsson
W Ajmal
Y Anselmetti
YC Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/05/2020
Field of study

The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes

Crossref

University of East Anglia digital repository

Deep Metazoan Phylogeny 2011 – new data, new challenges

Author: Krings Michael
Wörheide Gert
Publication venue
Publication date: 01/01/2011
Field of study

Open Access LMU

eggNOG v4.0: nested orthology inference across 3686 organisms

Author: Bork Peer
Creevey Chris
Forslund Kristoffer
Gabaldón Toni
Huerta-Cepas Jaime
Jensen Lars J.
Kuhn Michael
Powell Sean
Rattei Thomas
Roth Alexander
Szklarczyk Damian
Trachana Kalliopi
von Mering Christian
Publication venue
Publication date: 02/08/2017
Field of study

With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk downloa

RERO DOC Digital Library

Phylogenetic informativeness reconciles ray-finned fish molecular divergence times

Author: Alex Dornburg
Jeffrey P Townsend
Matt Friedman
Thomas J Near
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Discordance among individual molecular age estimates, or between molecular age estimates and the fossil record, is observed in many clades across the Tree of Life. This discordance is attributed to a variety of variables including calibration age uncertainty, calibration placement, nucleotide substitution rate heterogeneity, or the specified molecular clock model. However, the impact of changes in phylogenetic informativeness of individual genes over time on phylogenetic inferences is rarely analyzed. Using nuclear and mitochondrial sequence data for ray-finned fishes (Actinopterygii) as an example, we extend the utility of phylogenetic informativeness profiles to predict the time intervals when nucleotide substitution saturation results in discordance among molecular ages estimated. RESULTS: We demonstrate that even with identical calibration regimes and molecular clock methods, mitochondrial based molecular age estimates are systematically older than those estimated from nuclear sequences. This discordance is most severe for highly nested nodes corresponding to more recent (i.e., Jurassic-Recent) divergences. By removing data deemed saturated, we reconcile the competing age estimates and highlight that the older mtDNA based ages were driven by nucleotide saturation. CONCLUSIONS: Homoplasious site patterns in a DNA sequence alignment can systematically bias molecular divergence time estimates. Our study demonstrates that PI profiles can provide a non-arbitrary criterion for data exclusion to mitigate the influence of homoplasy on time calibrated branch length estimates. Analyses of actinopterygian molecular clocks demonstrate that scrutiny of the time scale on which sequence data is informative is a fundamental, but generally overlooked, step in molecular divergence time estimation

Springer - Publisher Connector

PubMed Central

Confounding factors in HGT detection: Statistical error, coalescent effects, and multiple solutions

Author: INNAN Hideki
NAKHLEH Luay
RUTHS Derek
THAN Cuong
秀樹印南
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 15/06/2007
Field of study

Prokaryotic organisms share genetic material across species boundaries by means of a process known as horizontal gene transfer (HGT). This process has great significance for understanding prokaryotic genome diversification and unraveling their complexities. Phylogeny-based detection of HGT is one of the most commonly used methods for this task, and is based on the fundamental fact that HGT may cause gene trees to disagree with one another, as well as with the species phylogeny. Using these methods, we can compare gene and species trees, and infer a set of HGT events to reconcile the differences among these trees. In this paper, we address three factors that confound the detection of the true HGT events, including the donors and recipients of horizontally transferred genes. First, we study experimentally the effects of error in the estimated gene trees (statistical error) on the accuracy of inferred HGT events. Our results indicate that statistical error leads to overestimation of the number of HGT events, and that HGT detection methods should be designed with unresolved gene trees in mind. Second, we demonstrate, both theoretically and empirically, that based on topological comparison alone, the number of HGT scenarios that reconcile a pair of species/gene trees may be exponential. This number may be reduced when branch lengths in both trees are estimated correctly. This set of results implies that in the absence of additional biological information, and/or a biological model of how HGT occurs, multiple HGT scenarios must be sought, and efficient strategies for how to enumerate such solutions must be developed. Third, we address the issue of lineage sorting, how it confounds HGT detection, and how to incorporate it with HGT into a single stochastic framework that distinguishes between the two events by extending population genetics theories. This result is very important, particularly when analyzing closely related organisms, where coalescent effects may not be ignored when reconciling gene trees. In addition to these three confounding factors, we consider the problem of enumerating all valid coalescent scenarios that constitute plausible species/gene tree reconciliations, and develop a polynomial-time dynamic programming algorithm for solving it. This result bears great significance on reducing the search space for heuristics that seek reconciliation scenarios. Finally, we show, empirically, that the locality of incongruence between a pair of trees has an impact on the numbers of HGT and coalescent reconciliation scenarios

Graduate University for Advanced Studies [SOKENDAI] Institutional Repository

Quantitative methods for reconstructing protein-protein interaction histories

Author: Topping Ryan
Publication venue: Life Sciences, Imperial College London
Publication date: 01/07/2013
Field of study

Protein-protein interactions (PPIs) are vital for the function of a cell and the evolution of these interactions produce much of the evolution of phenotype of an organism. However, as the evolutionary process cannot be observed, methods are required to infer evolution from existing data. An understanding of the resulting evolutionary relationships between species can then provide information for PPI prediction and function assignment. This thesis further develops and applies the interaction tree method for modelling PPI evolution within and between protein families. In this approach, a phylogeny of the protein family/ies of interest is used to explicitly construct a history of duplication and specification events. Given a model relating sequence change in this phylogeny to the probability of a rewiring event occurring, this method can then infer probabilities of interaction between the ancestral proteins described in the phylogeny. It is shown that the method can be adapted to infer the evolution of PPIs within obligate protein complexes, using a large set of such complexes to validate this application. This approach is then applied to reconstruct the history of the proteasome complex, using x-ray crystallography structures of the complex as input, with validation to show its utility in predicting present day complexes for which we have no structural data. The methodology is then adapted for application to transient PPIs. It is shown that the approach used in the previous chapter is inadequate here and a new scoring system is described based on a likelihood score of interaction. The predictive ability of this score is shown in predicting known two component systems in bacteria and its use in an interaction tree setting is demonstrated through inference of the interaction history between the histidine kinase and response regulator proteins responsible for sporulation onset in a set of bacteria. This thesis demonstrates that with suitable modifications the interaction tree approach is widely applicable to modelling PPI evolution and also, importantly, predicting existing PPIs. This demonstrates the need to incorporate phylogenetic data in to methods of predicting PPIs and gives some measure of the benefit in doing so

Spiral - Imperial College Digital Repository