12 research outputs found

    Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs

    Get PDF
    The function of most proteins is not determined experimentally, but is extrapolated from homologs. According to the “ortholog conjecture”, or standard model of phylogenomics, protein function changes rapidly after duplication, leading to paralogs with different functions, while orthologs retain the ancestral function. We report here that a comparison of experimentally supported functional annotations among homologs from 13 genomes mostly supports this model. We show that to analyze GO annotation effectively, several confounding factors need to be controlled: authorship bias, variation of GO term frequency among species, variation of background similarity among species pairs, and propagated annotation bias. After controlling for these biases, we observe that orthologs have generally more similar functional annotations than paralogs. This is especially strong for sub-cellular localization. We observe only a weak decrease in functional similarity with increasing sequence divergence. These findings hold over a large diversity of species; notably orthologs from model organisms such as E. coli, yeast or mouse have conserved function with human proteins

    iPhy: an integrated phylogenetic workbench for supermatrix analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.</p> <p>Results</p> <p>Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users.</p> <p>Conclusions</p> <p>iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.</p

    Reconstruction of time-consistent species trees

    Get PDF
    Background The history of gene families—which are equivalent to event-labeled gene trees—can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are “biologically feasible” which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way. Results In this contribution, we consider event-labeled gene trees that contain speciations, duplications as well as horizontal gene transfer (HGT) and we assume that the species tree is unknown. Although many problems become NP-hard as soon as HGT and time-consistency are involved, we show, in contrast, that the problem of finding a time-consistent species tree for a given event-labeled gene can be solved in polynomial-time. We provide a cubic-time algorithm to decide whether a “time-consistent” species tree for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity

    Increasing species sampling in chelicerate genomic-scale datasets provides support for monophyly of Acari and Arachnida.

    Get PDF
    Chelicerates are a diverse group of arthropods, represented by such forms as predatory spiders and scorpions, parasitic ticks, humic detritivores, and marine sea spiders (pycnogonids) and horseshoe crabs. Conflicting phylogenetic relationships have been proposed for chelicerates based on both morphological and molecular data, the latter usually not recovering arachnids as a clade and instead finding horseshoe crabs nested inside terrestrial Arachnida. Here, using genomic-scale datasets and analyses optimised for countering systematic error, we find strong support for monophyletic Acari (ticks and mites), which when considered as a single group represent the most biodiverse chelicerate lineage. In addition, our analysis recovers marine forms (sea spiders and horseshoe crabs) as the successive sister groups of a monophyletic lineage of terrestrial arachnids, suggesting a single colonisation of land within Chelicerata and the absence of wholly secondarily marine arachnid orders.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

    Inferring Orthology and Paralogy.

    Get PDF
    The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments
    corecore