496 research outputs found

    Exact reconciliation of undated trees

    Full text link
    Reconciliation methods aim at recovering macro evolutionary events and at localizing them in the species history, by observing discrepancies between gene family trees and species trees. In this article we introduce an Integer Linear Programming (ILP) approach for the NP-hard problem of computing a most parsimonious time-consistent reconciliation of a gene tree with a species tree when dating information on speciations is not available. The ILP formulation, which builds upon the DTL model, returns a most parsimonious reconciliation ranging over all possible datings of the nodes of the species tree. By studying its performance on plausible simulated data we conclude that the ILP approach is significantly faster than a brute force search through the space of all possible species tree datings. Although the ILP formulation is currently limited to small trees, we believe that it is an important proof-of-concept which opens the door to the possibility of developing an exact, parsimony based approach to dating species trees. The software (ILPEACE) is freely available for download

    Joint amalgamation of most parsimonious reconciled gene trees.

    Get PDF
    MOTIVATION Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods-generally computationally more efficient-require a prior estimate of parameters and of the statistical support. RESULTS Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events

    A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation

    Get PDF
    Due to the key role played by gene trees and species phylogenies in biological studies, it is essential to have as much confidence as possible on the available trees. As phylogenetic tools are error-prone, it is a common task to use a correction method for improving an initial tree. Various correction methods exist. In this paper we focus on those based on the Duplication-Loss reconciliation model. The polytomy resolution approach consists in contracting weakly supported branches and then refining the obtained non-binary tree in a way minimizing a reconciliation distance with the given species tree. On the other hand, the supertree approach takes as input a set of separated subtrees, either obtained for separared orthology groups or by removing the upper branches of an initial tree to a certain level, and amalgamating them in an optimal way preserving the topology of the initial trees. The two classes of problems have always been considered as two separate fields, based on apparently different models. In this paper we give a unifying view showing that these two classes of problems are in fact special cases of a more general problem that we call LabelGTC, whose input includes a 0-1 edge-labelled gene tree to be corrected. Considering a tree as a set of triplets, we also formulate the TripletGTC Problem whose input includes a set of gene triplets that should be preserved in the corrected tree. These two general models allow to unify, understand and compare the principles of the duplication-loss reconciliation-based tree correction approaches. We show that LabelGTC is a special case of TripletGTC. We then develop appropriate algorithms allowing to handle these two general correction problems

    Gene tree correction guided by orthology

    Get PDF
    International audienceBackgroundReconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny.ResultsWe explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent.AvailabilitySoftware and data available upon request to the corresponding author

    TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees

    Get PDF
    Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1

    Naturally Occurring Isoleucyl-tRNA Synthetase without tRNA-dependent Pre-transfer Editing

    Get PDF
    Isoleucyl-tRNA synthetase (IleRS) is unusual among aminoacyl-tRNA synthetases in having a tRNA-dependent pre-transfer editing activity. Alongside the typical bacterial IleRS (such as Escherichia coli IleRS), some bacteria also have the enzymes (eukaryote-like) that cluster with eukaryotic IleRSs and exhibit low sensitivity to the antibiotic mupirocin. Our phylogenetic analysis suggests that the ileS1 and ileS2 genes of contemporary bacteria are the descendants of genes that might have arisen by an ancient duplication event before the separation of bacteria and archaea. We present the analysis of evolutionary constraints of the synthetic and editing reactions in eukaryotic/eukaryote-like IleRSs, which share a common origin but diverged through adaptation to different cell environments. The enzyme from the yeast cytosol exhibits tRNA-dependent pre-transfer editing analogous to E. coli IleRS. This argues for the presence of this proofreading in the common ancestor of both IleRS types and an ancient origin of the synthetic site-based quality control step. Yet surprisingly, the eukaryote-like enzyme from Streptomyces griseus IleRS lacks this capacity; at the same time, its synthetic site displays the 10(3)-fold drop in sensitivity to antibiotic mupirocin relative to the yeast enzyme. The discovery that pre-transfer editing is optional in IleRSs lends support to the notion that the conserved post-transfer editing domain is the main checkpoint in these enzymes. We substantiated this by showing that under error-prone conditions S. griseus IleRS is able to rescue the growth of an E. coli lacking functional IleRS, providing the first evidence that tRNA-dependent pre-transfer editing in IleRS is not essential for cell viability

    The link between orthology relations and gene trees: a correction perspective

    Get PDF

    Évolution de l’architecture des génomes : modélisation et reconstruction phylogénétique

    Get PDF
    Genomes evolve through processes that modify their content and organization at different scales, ranging from the substitution, insertion or deletion of a single nucleotide to the duplication, loss or transfer of a gene and to large scale chromosomal rearrangements. Extant genomes are the result of a combination of many such processes, which makes it difficult to reconstruct the overall picture of genome evolution. As a result, most models and methods focus on one scale and use only one kind of data, such as gene orders or sequence alignments. Most phylogenetic reconstruction methods focus on the evolution of sequences. Recently, some of these methods have been extended to integrate gene family evolution. Chromosomal rearrangements have also been extensively studied, leading to the development of many models for the evolution of the architecture of genomes. These two ways to model genome evolution have not exchanged much so far, mainly because of computational issues. In this thesis, I present a new model of evolution for the architecture of genomes that accounts for the evolution of gene families. With this model, one can reconstruct the evolutionary history of gene adjacencies and gene order accounting for events that modify the gene content of genomes (duplications and losses of genes) and for events that modify the architecture of genomes (chromosomal rearrangements). Integrating these two types of information in a single model yields more accurate evolutionary histories. Moreover, we show that reconstructing ancestral gene orders can provide feedback on the quality of gene trees thus paving the way for an integrative model and reconstruction methodL'évolution des génomes peut être observée à plusieurs échelles, chaque échelle révélant des processus évolutifs différents. A l'échelle de séquences ADN, il s'agit d'insertions, délétions et substitutions de nucléotides. Si l'on s'intéresse aux gènes composant les génomes, il s'agit de duplications, pertes et transferts horizontaux de gènes. Et à plus large échelle, on observe des réarrangements chromosomiques modifiant l'agencement des gènes sur les chromosomes. Reconstruire l'histoire évolutive des génomes implique donc de comprendre et de modéliser tous les processus à l'œuvre, ce qui reste hors de notre portée. A la place, les efforts de modélisation ont exploré deux directions principales. D'un côté, les méthodes de reconstruction phylogénétique se sont concentrées sur l'évolution des séquences, certaines intégrant l'évolution des familles de gènes. D'un autre côté, les réarrangements chromosomiques ont été très largement étudiés, donnant naissance à de nombreux modèles d'évolution de l'architecture des génomes. Ces deux voies de modélisation se sont rarement rencontrées jusqu'à récemment. Au cours de ma thèse, j'ai développé un modèle d'évolution de l'architecture des génomes prenant en compte l'évolution des gènes et des séquences. Ce modèle rend possible une reconstruction probabiliste de l'histoire évolutive d'adjacences et de l'ordre des gènes de génomes ancestraux en tenant compte à la fois d'évènements modifiant le contenu en gènes des génomes (duplications et pertes de gènes), et d'évènements modifiant l'architecture des génomes (les réarrangements chromosomiques). Intégrer l'information phylogénétique à la reconstruction d'ordres des gènes permet de reconstruire des histoires évolutives plus complètes. Inversement, la reconstruction d'ordres des gènes ancestraux peut aussi apporter une information complémentaire à la phylogénie et peut être utilisée comme un critère pour évaluer la qualité d'arbres de gènes, ouvrant la voie à un modèle et une reconstruction intégrativ

    Predicting Functional Alterations Caused By Non-synonymous Variants in CHO Using Models Based on Phylogenetic Tree and Evolutionary Preservation

    Get PDF
    Chinese Hamster Ovary (CHO) cell is a major manufacturing platform for one of the most valuable biopharmaceutical products: monoclonal antibodies. Being an immortal cell line adapted to different environments, CHO has been accumulating massive mutations in its genome. Continuous effort has been invested into building a computational model to predict CHO cell productivity. However, not much attention has been focused on its proteins which are surely effected by the mutations accumulated to some extent. In this project, we focused on the functional effect caused by non-synonymous variants found in CHO genome. A tool was built to firstly identify these variants and then predict their potential function effect by preservation, a concept derived from evolutionary conservation. Firstly, the PANTHER subfamilies, which defined on the base of potential function change within gene trees, were extended by adding proteins from species not covered by PANTHER. Sequences within the same subfamily were then aligned and had Hidden Markov Models (HMMs) built on these alignments. The HMMs were used to identify homologs in CHO proteins. After that preservation were calculated in every site of the alignments, which was then used to predict the function alterations caused by mutations on every site. Our tool was then validated using data from origin PANTHER subfamilies, PANTHER-PSEP which also calculated site preservation and BLAST, a well-accepted homolog searching algorithm. CHO protein sequences were then imported and analysed by our tool. For comparison, protein sequences from Chinese hamster were also analysed alone with two published CHO cell lines: CHO-K1 and CHO-K1GS. The predictions of proteins from these three genomes were then compared by mapping onto Gene Ontology (GO). Some detailed case studies were also demonstrated. Our tool showed good performance in validations, however, they failed to produce useful hypotheses that would motivate further experiments on bench. The potential causes are discussed at the end
    corecore