63 research outputs found

    A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation

    Get PDF
    Due to the key role played by gene trees and species phylogenies in biological studies, it is essential to have as much confidence as possible on the available trees. As phylogenetic tools are error-prone, it is a common task to use a correction method for improving an initial tree. Various correction methods exist. In this paper we focus on those based on the Duplication-Loss reconciliation model. The polytomy resolution approach consists in contracting weakly supported branches and then refining the obtained non-binary tree in a way minimizing a reconciliation distance with the given species tree. On the other hand, the supertree approach takes as input a set of separated subtrees, either obtained for separared orthology groups or by removing the upper branches of an initial tree to a certain level, and amalgamating them in an optimal way preserving the topology of the initial trees. The two classes of problems have always been considered as two separate fields, based on apparently different models. In this paper we give a unifying view showing that these two classes of problems are in fact special cases of a more general problem that we call LabelGTC, whose input includes a 0-1 edge-labelled gene tree to be corrected. Considering a tree as a set of triplets, we also formulate the TripletGTC Problem whose input includes a set of gene triplets that should be preserved in the corrected tree. These two general models allow to unify, understand and compare the principles of the duplication-loss reconciliation-based tree correction approaches. We show that LabelGTC is a special case of TripletGTC. We then develop appropriate algorithms allowing to handle these two general correction problems

    Reconstructing the History of Syntenies Through Super-Reconciliation

    Get PDF
    Classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is clearly not suited for genes grouped in syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation model, that extends the traditional Duplication-Loss model to the reconciliation of a set of trees, accounting for segmental duplications and losses. From a complexity point of view, we show that the associated decision problem is NP-hard. We then give an exact exponential-time algorithm for this problem, assess its time efficiency on simulated datasets, and give a proof of concept on the opioid receptor genes

    Evolution through segmental duplications and losses : A Super-Reconciliation approach

    Get PDF
    The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes

    Genome-scale phylogenetic analysis finds extensive gene transfer among Fungi

    Get PDF
    Although the role of lateral gene transfer is well recognized in the evolution of bacteria, it is generally assumed that it has had less influence among eukaryotes. To explore this hypothesis we compare the dynamics of genome evolution in two groups of organisms: Cyanobacteria and Fungi. Ancestral genomes are inferred in both clades using two types of methods. First, Count, a gene tree unaware method that models gene duplications, gains and losses to explain the observed numbers of genes present in a genome. Second, ALE, a more recent gene tree-aware method that reconciles gene trees with a species tree using a model of gene duplication, loss, and transfer. We compare their merits and their ability to quantify the role of transfers, and assess the impact of taxonomic sampling on their inferences. We present what we believe is compelling evidence that gene transfer plays a significant role in the evolution of Fungi

    A roadmap for global synthesis of the plant tree of life

    Get PDF
    Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis

    Algorithmes de construction et correction d'arbres de gènes par la réconciliation

    Get PDF
    Les gènes, qui servent à encoder les fonctions biologiques des êtres vivants, forment l'unité moléculaire de base de l'hérédité. Afin d'expliquer la diversité des espèces que l'on peut observer aujourd'hui, il est essentiel de comprendre comment les gènes évoluent. Pour ce faire, on doit recréer le passé en inférant leur phylogénie, c'est-à-dire un arbre de gènes qui représente les liens de parenté des régions codantes des vivants. Les méthodes classiques d'inférence phylogénétique ont été élaborées principalement pour construire des arbres d'espèces et ne se basent que sur les séquences d'ADN. Les gènes sont toutefois riches en information, et on commence à peine à voir apparaître des méthodes de reconstruction qui utilisent leurs propriétés spécifiques. Notamment, l'histoire d'une famille de gènes en terme de duplications et de pertes, obtenue par la réconciliation d'un arbre de gènes avec un arbre d'espèces, peut nous permettre de détecter des faiblesses au sein d'un arbre et de l'améliorer. Dans cette thèse, la réconciliation est appliquée à la construction et la correction d'arbres de gènes sous trois angles différents: 1) Nous abordons la problématique de résoudre un arbre de gènes non-binaire. En particulier, nous présentons un algorithme en temps linéaire qui résout une polytomie en se basant sur la réconciliation. 2) Nous proposons une nouvelle approche de correction d'arbres de gènes par les relations d'orthologie et paralogie. Des algorithmes en temps polynomial sont présentés pour les problèmes suivants: corriger un arbre de gènes afin qu'il contienne un ensemble d'orthologues donné, et valider un ensemble de relations partielles d'orthologie et paralogie. 3) Nous montrons comment la réconciliation peut servir à "combiner'' plusieurs arbres de gènes. Plus précisément, nous étudions le problème de choisir un superarbre de gènes selon son coût de réconciliation.Genes encode the biological functions of all living organisms and are the basic molecular units of heredity. In order to explain the diversity of species that can be observed today, it is essential to understand how genes evolve. To do this, the past has to be recreated by inferring their phylogeny, i.e. a gene tree depicting the parental relationships between the coding regions of living beings. Traditional phylogenetic inference methods have been developed primarily to construct species trees and are solely based on DNA sequences. Genes, however, are rich in information and only a few known reconstruction methods make usage of their specific properties. In particular, the history of a gene family in terms of duplications and losses, obtained by the reconciliation of a gene tree with a tree species, may allow us to detect weaknesses in a tree and improve it. In this thesis, reconciliation is applied to the construction and correction of gene trees from three different angles: 1) We address the problem of resolving a non-binary gene tree. In particular, we present a linear time algorithm that solves a polytomy based on reconciliation. 2) We propose a new gene tree correction approach based on orthology and paralogy relations. Polynomial-time algorithms are presented for the following problems: modify a gene tree so that it contains a given set of orthologous genes, and validate a set of partial orthology and paralogy relations. 3) We show how reconciliation can be used to "combine'' multiple gene trees. Specifically, we study the problem of choosing a gene supertree based on its reconciliation cost

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    Modèles et algorithmes pour la segmentation de séquences biologiques et la reconstruction de leurs histoires évolutives

    Get PDF
    L’informatique est de plus en plus utilisée pour résoudre des problèmes dans divers domaines. C’est ainsi qu’avec l’accroissement des données biologiques générées par les techniques expérimentales à haut débit, la bio-informatique intervient pour tirer profit de ces masses de données et contribuer à l’avancement des connaissances en sciences biologiques. La bio-informatique est un domaine interdisciplinaire ayant pour but d’étudier et de résoudre des problèmes computationnels issus des sciences biologiques. Un des problèmes intemporels étudié en bio-informatique est la reconstruction de l’histoire évolutive de génomes, qui sous-entend essentiellement celle des gènes. Les gènes sont le support de l’information génétique et sont les unités de base de l’hérédité. De nos jours, un grand nombre de maladies, telles les cancers, ont une base génétique. Une bonne compréhension de l’évolution des gènes permettrait de mieux comprendre les processus impliqués dans ces maladies pour mieux les traiter. De plus, les connaissances sur l’évolution de gènes sont utiles pour la prédiction et l’annotation de nouveaux gènes. Il a été montré que les gènes eucaryotes subissent un phénomène d’épissage alternatif qui permet aux gènes de produire plusieurs transcrits différents afin de se diversifier fonctionnellement. C’est dans ce contexte que se situe cette thèse de doctorat. L’objectif de la thèse est de définir des modèles et des algorithmes efficaces et précis pour la segmentation de séquences biologiques et la reconstruction de leurs histoires évolutives en tenant compte de l’épissage alternatif. Dans cette thèse, j'ai contribué à accroître les connaissances scientifiques en introduisant et en formalisant des modèles d’évolution de transcrits et de gènes. Nous avons proposé deux algorithmes pour la segmentation de transcrits alternatifs. Nous avons également proposé un outil de simulation de l’évolution des séquences biologiques et un outil de visualisation de coévolution. Pour chacun des modèles et algorithmes proposés, nous avons développé des applications pour permettre l’utilisation facile de nos outils
    • …
    corecore