12 research outputs found

    Homology-based inference sets the bar high for protein function prediction

    Get PDF
    Background: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. Methods: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. Results and conclusions: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA

    fusion_proteins_to_functions.tsv.gz

    No full text
    Dataset detailing assignment of bacterial proteins to functions determined by Fusion. File is tab-separated. Columns in file: fusion_protein_id -> internal identifier ncbi_accession -> accession number of protein as reported in NCBI protein database seguid -> SegUID of protein sequence protein_name -> name of protein as extracted from  fasta file (Genbank) sequence_type -> whether protein orginates from bacterial genome (chromosome) or plasmid fusion_function_id -> fusion function identifier (fuctions start with prefix 'P-', singletons start with prefix 'S-') organism_name -> name of bacterial organism ncbi_taxid -> taxid of organism as annoated in NCBI ncbi_species_taxid -> species taxid of organism as annotated by NCBI assembly_accession -> assemby accession id  </ol

    Functional & Structural Similarities

    No full text
    This file contains strucural and functional similarity predictions reported in (tbd)</p

    Data files for fusion-SNN git repository

    No full text
    The taz.gz file contains the necessary data for https://bitbucket.org/bromberglab/fusion-snn/. </p

    Fusion based organism similarities (balanced organism set)

    No full text
    This dataset contains Fusion function profile based organsim similarities reported in (tbd)</p

    Predicted Molecular Effects of Sequence Variants Link to System Level of Disease

    No full text
    <div><p>Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, <i>e</i>.<i>g</i>. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, <i>i</i>.<i>e</i>. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, <i>in silico</i> methods focusing on the micro level of molecular function can help to understand the macro system level of disease.</p></div

    Predictions of SAV effects upon function and disease across species.

    No full text
    <p>The numbers above bars give the number of SAVs in the set. <b>A</b>: Three methods (SNAP2 [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref016" target="_blank">16</a>], SIFT [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref027" target="_blank">27</a>], PolyPhen-2 [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref012" target="_blank">12</a>]) predicted SAV effects upon molecular function (TrEffect/TrNeutral) and upon disease (OMIM). Exclusively for this panel SNAP2 was trained without using disease SAVs from OMIM [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref005" target="_blank">5</a>] or HumVar [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref028" target="_blank">28</a>]. The SNAP2 version trained exclusively on molecular function clearly captured aspects of OMIM-disease SAVs (leftmost bar OMIM higher than 2<sup>nd</sup> to the left TrEffect). TrNeutral was the SNAP2 training set of variants without effect. Comparing the bars for TrNeutral and OMIM for each method pointed to differential thresholds: Polyphen-2 correctly predicted more effect in OMIM than SNAP2 but also incorrectly predicted more effect in the neutral data, <i>i</i>.<i>e</i>. simply predicted more effect variants. <b>B:</b> OMIM is repeated from A. SNAP2 captured disease signals in humans and animals at similar levels. OMIA contained disease SAVs from animals other than mouse and rat (mostly dog and cattle). <b>C:</b> SNAP2 predicted OMIM SAVs with less effect in mouse orthologs than in human. Left bar (<i>OMIM with mouse ortholog</i>): SNAP2 predictions for the subset of all 4,229 OMIM SAVs for which we found a mouse ortholog. Right bar (<i>OMIM in mouse</i>): SNAP2 predictions when putting the human SAV into the mouse sequence. <b>D:</b> Disease variants happen in non-random positions. Left bar (<i>NotOMIM conserved</i>): in each protein with an OMIM SAV, we predicted the effect of all SAVs with a level of sequence conservation ≥ that of the OMIM variant. Right bar (<i>NotOMIM not conserved</i>): predictions for SAVs in non-OMIM positions with conservation < that of the OMIM SAV. Obviously, OMIM SAVs were very well conserved.</p

    Quantifying structural relationships of metal-binding sites suggests origins of biological electron transfer

    Get PDF
    © 2022 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).Biological redox reactions drive planetary biogeochemical cycles. Using a novel, structure-guided sequence analysis of proteins, we explored the patterns of evolution of enzymes responsible for these reactions. Our analysis reveals that the folds that bind transition metal-containing ligands have similar structural geometry and amino acid sequences across the full diversity of proteins. Similarity across folds reflects the availability of key transition metals over geological time and strongly suggests that transition metal-ligand binding had a small number of common peptide origins. We observe that structures central to our similarity network come primarily from oxidoreductases, suggesting that ancestral peptides may have also facilitated electron transfer reactions. Last, our results reveal that the earliest biologically functional peptides were likely available before the assembly of fully functional protein domains over 3.8 billion years ago.11Nsciescopu
    corecore