177 research outputs found

    Benchmarking gene ontology function predictions using negative annotations.

    Get PDF
    With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. Supplementary data are available at Bioinformatics online

    A putative origin of the insect chemosensory receptor superfamily in the last common eukaryotic ancestor

    Get PDF
    The insect chemosensory repertoires of Odorant Receptors (ORs) and Gustatory Receptors (GRs) together represent one of the largest families of ligand-gated ion channels. Previous analyses have identified homologous 'Gustatory Receptor-Like (GRL)' proteins across Animalia, but the evolutionary origin of this novel class of ion channels is unknown. We describe a survey of unicellular eukaryotic genomes for GRLs, identifying several candidates in fungi, protists and algae that contain many structural features characteristic of animal GRLs. The existence of these proteins in unicellular eukaryotes, together with ab initio protein structure predictions, provide evidence for homology between GRLs and a family of uncharacterized plant proteins containing the DUF3537 domain. Together, our analyses suggest an origin of this protein superfamily in the last common eukaryotic ancestor

    Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

    Get PDF
    Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license

    Assigning confidence scores to homoeologs using fuzzy logic.

    Get PDF
    In polyploid genomes, homoeologs are a specific subtype of homologs, and can be thought of as orthologs between subgenomes. In Orthologous MAtrix, we infer homoeologs in three polyploid plant species: upland cotton (Gossypium hirsutum), rapeseed (Brassica napus), and bread wheat (Triticum aestivum). While we can typically recognize the features of a "good" homoeolog prediction (a consistent evolutionary distance, high synteny, and a one-to-one relationship), none of them is a hard-fast criterion. We devised a novel fuzzy logic-based method to assign confidence scores to each pair of predicted homoeologs. We inferred homoeolog pairs and used the new and improved method to assign confidence scores, which ranged from 0 to 100. Most confidence scores were between 70 and 100, but the distribution varied between genomes. The new confidence scores show an improvement over our previous method and were manually evaluated using a subset from various confidence ranges

    Membrane Proteins Are Dramatically Less Conserved than Water-Soluble Proteins across the Tree of Life.

    Get PDF
    Membrane proteins are crucial in transport, signaling, bioenergetics, catalysis, and as drug targets. Here, we show that membrane proteins have dramatically fewer detectable orthologs than water-soluble proteins, less than half in most species analyzed. This sparse distribution could reflect rapid divergence or gene loss. We find that both mechanisms operate. First, membrane proteins evolve faster than water-soluble proteins, particularly in their exterior-facing portions. Second, we demonstrate that predicted ancestral membrane proteins are preferentially lost compared with water-soluble proteins in closely related species of archaea and bacteria. These patterns are consistent across the whole tree of life, and in each of the three domains of archaea, bacteria, and eukaryotes. Our findings point to a fundamental evolutionary principle: membrane proteins evolve faster due to stronger adaptive selection in changing environments, whereas cytosolic proteins are under more stringent purifying selection in the homeostatic interior of the cell. This effect should be strongest in prokaryotes, weaker in unicellular eukaryotes (with intracellular membranes), and weakest in multicellular eukaryotes (with extracellular homeostasis). We demonstrate that this is indeed the case. Similarly, we show that extracellular water-soluble proteins exhibit an even stronger pattern of low homology than membrane proteins. These striking differences in conservation of membrane proteins versus water-soluble proteins have important implications for evolution and medicine

    Only a Single Taxonomically Restricted Gene Family in the Drosophila melanogaster Subgroup Can Be Identified with High Confidence

    Get PDF
    Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously noncoding sequences: functional ncRNA, introns, or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false-positives (nonfunctional open reading frames and/or functional genes that did not arise de novo) and false-negatives. Here, we search conservatively for high-confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous noncoding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single-copy genes in Drosophila simulans and Drosophila sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that predates their open reading frame. These genes have not been previously reported as de novo originated, and to our knowledge, they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes

    Compensation of Dosage-Sensitive Genes on the Chicken Z Chromosome.

    Get PDF
    In many diploid species, sex determination is linked to a pair of sex chromosomes that evolved from a pair of autosomes. In these organisms, the degeneration of the sex-limited Y or W chromosome causes a reduction in gene dose in the heterogametic sex for X- or Z-linked genes. Variations in gene dose are detrimental for large chromosomal regions when they span dosage-sensitive genes, and many organisms were thought to evolve complete mechanisms of dosage compensation to mitigate this. However, the recent realization that a wide variety of organisms lack complete mechanisms of sex chromosome dosage compensation has presented a perplexing question: How do organisms with incomplete dosage compensation avoid deleterious effects of gene dose differences between the sexes? Here we use expression data from the chicken (Gallus gallus) to show that ohnologs, duplicated genes known to be dosage-sensitive, are preferentially dosage-compensated on the chicken Z chromosome. Our results indicate that even in the absence of a complete and chromosome wide dosage compensation mechanism, dosage-sensitive genes are effectively dosage compensated on the Z chromosome
    corecore