209 research outputs found

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    Large-scale assignment of orthology: back to phylogenetics?

    Get PDF
    Automated use of phylogenetic trees to deduce orthology relationships in proteins

    Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To understand the evolutionary role of Lateral Gene Transfer (LGT), accurate methods are needed to identify transferred genes and infer their timing of acquisition. Phylogenetic methods are particularly promising for this purpose, but the reconciliation of a gene tree with a reference (species) tree is computationally hard. In addition, the application of these methods to real data raises the problem of sorting out real and artifactual phylogenetic conflict.</p> <p>Results</p> <p>We present Prunier, a new method for phylogenetic detection of LGT based on the search for a maximum statistical agreement forest (MSAF) between a gene tree and a reference tree. The program is flexible as it can use any definition of "agreement" among trees. We evaluate the performance of Prunier and two other programs (EEEP and RIATA-HGT) for their ability to detect transferred genes in realistic simulations where gene trees are reconstructed from sequences. Prunier proposes a single scenario that compares to the other methods in terms of sensitivity, but shows higher specificity. We show that LGT scenarios carry a strong signal about the position of the root of the species tree and could be used to identify the direction of evolutionary time on the species tree. We use Prunier on a biological dataset of 23 universal proteins and discuss their suitability for inferring the tree of life.</p> <p>Conclusions</p> <p>The ability of Prunier to take into account branch support in the process of reconciliation allows a gain in complexity, in comparison to EEEP, and in accuracy in comparison to RIATA-HGT. Prunier's greedy algorithm proposes a single scenario of LGT for a gene family, but its quality always compares to the best solutions provided by the other algorithms. When the root position is uncertain in the species tree, Prunier is able to infer a scenario per root at a limited additional computational cost and can easily run on large datasets.</p> <p>Prunier is implemented in C++, using the Bio++ library and the phylogeny program Treefinder. It is available at: <url>http://pbil.univ-lyon1.fr/software/prunier</url></p

    Algorithms, load balancing strategies, and dynamic kernels for large-scale phylogenetic tree inference under Maximum Likelihood

    Get PDF
    Phylogenetik, die Analyse der evolutionären Beziehungen zwischen biologischen Einheiten, spielt eine wesentliche Rolle in der biologischen und medizinischen Forschung. Ihre Anwendungen reichen von der Beantwortung grundlegender Fragen, wie der nach dem Ursprungs des Lebens, bis hin zur Lösung praktischer Probleme, wie der Verfolgung von Pandemien in Echtzeit. Heutzutage werden Phylogenetische Bäume typischerweise anhand molekularer Daten über wahrscheinlichkeitsbasierte Methoden berechnet. Diese Verfahren suchen nach demjenigen Stammbaum, welcher eine Likelihood-basierte Bewertungsfunktion unter einem gegebenen stochastischen Modell der Sequenzevolution maximiert. Die vorliegende Arbeit konzentriert sich auf die Inferenz Phylogenetischer Bäume von Arten sowie Genen. Arten entwickeln sich durch Artbildungs- und Aussterbeereignisse. Gene entwickeln sich durch Ereignisse wie Genduplikation, Genverlust und horizontalen Gentransfer. Beide Ausprägungen der Evolution hängen miteinander zusammen, da Gene zu Arten gehören und sich innerhalb des Genoms der Arten entwickeln. Man kann Modelle der Gen-Evolution einsetzen, welche diesen Zusammenhang zwischen der Evolutionsgeschichte von Arten und Genen berücksichtigen, um die Genauigkeit phylogenetischer Baumsuchen zu verbessern. Die klassischen Methoden der phylogenetischen Inferenz ignorieren diese Phänomene und basieren ausschlie\ss lich auf Modellen der Sequenz-Evolution. Darüber hinaus sind aktuelle Maximum-Likelihood-Verfahren rechenaufwendig. Dies stellt eine große Herausforderung dar, zumal aufgrund der Fortschritte in der Sequenzierungstechnologie immer mehr molekulare Daten verfügbar werden und somit die verfügbare Datenmenge drastisch anwächst. Um diese Datenlawine zu bewältigen, benötigt die biologische Forschung dringend Werkzeuge, welche schnellere Algorithmen sowie effiziente parallele Implementierungen zur Verfügung stellen. In dieser Arbeit entwickle ich neue Maximum-Likelihood Methoden, welche auf einer expliziten Modellierung der gemeinsamen Evolutionsgeschichte von Arten und Genen basieren, um genauere phylogenetische Bäume abzuleiten. Außerdem implementiere ich neue Heuristiken und spezifische Parallelisierungsschemata um den Inferenzprozess zu beschleunigen. Mein erstes Projekt, ParGenes, ist eine parallele Softwarepipeline zum Ableiten von Genstammbäumen aus einer Menge genspezifischer Multipler Sequenzalignments. Für jedes Eingabealignment bestimmt ParGenes zunächst das am besten geeignete Modell der Sequenzevolution und sucht anschließend nach dem Genstammbaum mit der höchsten Likelihood unter diesem Modell. Dies erfolgt anhand von Methoden, welche dem aktuellen Stand der Wissenschaft entsprechen, parallel ausgeführt werden können und sich einer neuartigen Lastverteilungsstrategie bedienen. Mein zweites Projekt, SpeciesRax, ist eine Methode zum Ableiten eines gewurzelten Artenbaums aus einer Menge entsprechender ungewurzelter Genstammbäume. Berücksichtigt wird die Evolution eines Gens unter Genduplikation, Genverlust und horizontalem Gentransfer. SpeciesRax sucht den gewurzelten Artenbaum, der die Likelihood-basierte Bewertungsfunktion unter diesem Modell maximiert. Darüber hinaus führe ich eine neue Methode zur Berechnung von Konfidenzwerten auf den Kanten des resultierenden Artenbaumes ein und eine weitere Methode zur Schätzung der Kantenlängen des Artenbaumes. Mein drittes Projekt, GeneRax, ist eine neuartige Maximum-Likelihood-Methode zur Inferenz von Genstammbäumen. GeneRax liest als Eingabe einen gewurzelten Artenbaum sowie eine Menge genspezifischer Multipler Sequenz-Alignments und berechnet als Ausgabe einen Genstammbaum pro Eingabealignment. Dazu führe ich die sogenannte Joint Likelihood-Funktion ein, welche ein Modell der Sequenzevolution mit einem Modell der Genevolution kombiniert. Darüber hinaus kann GeneRax die Abfolge von Genduplikationen, Genverlusten und horizontalen Gentransfers abschätzen, die entlang des Eingabeartenbaums aufgetreten sind

    The inference of gene trees with species trees.

    Get PDF
    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution

    Fungal phylogenomics.A global analysis of fungal genomes and their evolution

    Get PDF
    Fungi is the eukaryotic group with a largest amount of completely sequenced species and therefore it is particularly well suited for comparative genomics analyses. A species tree is often an important part of phylogenomics analysis. Concern about its reliability led us to design several methods by which we could identify nodes in the species tree that were poorly supported by a whole phylome. We determined that the species tree was mostly well supported but some nodes showed large discrepancies to most genes.These results could partly be attributed to evolutionary events that result in topological changes in gene trees. Our analyses have shown that HGT plays an important role in fungal evolution. Gene duplications followed by differential loss are also often the cause of incongruence. The OXPHOS pathway, despite being formed by multi-protein complexes, has been affected by this process at similar levels than the rest of the genome.Els fongs són el grup d'espècies eucariotes amb un major nombre de genomes completament seqüenciats. Per això són un grup ideal on aplicar tècniques filogenòmiques. L'arbre de les espècies és un punt clau en molts anàlisis filogenòmics i com a tal necessitem saber si és fiable. Hem dissenyat diferents mesures que aprofiten la informació d'un filoma per identificar aquells punts en l'arbre de les especies que no estan ben suportats. Les discrepàncies que hem trobat poden ser degudes a successos evolutius (transferència horitzontal, duplicacions,...). Hem demostrat que la transferència horitzontal juga un paper important en l'evolució de fongs. També hem estudiat els efectes de duplicacions en l'evolució de la via metabòlica de la fosforilació oxidativa.Podem concloure que l'arbre de les especies és majoritàriament robust, però que necessitem ser capaços d'identificar nodes subjectes a variacions. Successos evolutius poden ser la causa de les discrepàncies observades en els arbres gènics

    Genome size evolution in the Archaea

    Get PDF
    What determines variation in genome size, gene content and genetic diversity at the broadest scales across the tree of life? Much of the existing work contrasts eukaryotes with prokaryotes, the latter represented mainly by Bacteria. But any general theory of genome evolution must also account for the Archaea, a diverse and ecologically important group of prokaryotes that represent one of the primary domains of cellular life. Here, we survey the extant diversity of Bacteria and Archaea, and ask whether the general principles of genome evolution deduced from the study of Bacteria and eukaryotes also apply to the archaeal domain. Although Bacteria and Archaea share a common prokaryotic genome architecture, the extant diversity of Bacteria appears to be much higher than that of Archaea. Compared with Archaea, Bacteria also show much greater genome-level specialisation to specific ecological niches, including parasitism and endosymbiosis. The reasons for these differences in long-term diversification rates are unclear, but might be related to fundamental differences in informational processing machineries and cell biological features that may favour archaeal diversification in harsher or more energy-limited environments. Finally, phylogenomic analyses suggest that the first Archaea were anaerobic autotrophs that evolved on the early Earth
    corecore