13 research outputs found

    An Expanded Evaluation of Protein Function Prediction Methods Shows an Improvement In Accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent. Keywords: Protein function prediction, Disease gene prioritizationpublishedVersio

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p

    Automatic Localization of Protein Function Labels to Specific Regions within the Sequence

    No full text
    Most of the protein function prediction methods used today operate at the protein-level, where functional labels are assigned to, and transferred between, full-length proteins based on overall similarities. However, majority of proteins are composed of multiple domains and function by specifically interacting with other proteins or molecules, thus many functional as- sociations should be limited to specific regions rather than the entire protein length. Existing domain-centric function prediction methods depend almost entirely on accurate domain fam- ily assignments to infer relationships between domains and functions, with regions that are unassigned to a known domain family left out of functional evaluation. This work describes the development of an explicit region-specific function prediction methodology that localizes protein labels to pre-partitioned regions by enforcing overarching constraints to propagate labels between regions while preserving their existing assignments at the protein-level. We facilitate this by developing a text processing pipeline to integrate feature annotations from disparate data sources into a single feature-rich representation that more ad- equately encapsulates known functional features within regions, even in the absence of domain family assignments. The results were validated at the region-level using binding site annota- tions extracted from protein-ligand structures and show that the framework can successfully localize and improve protein-level prediction for site-specific functions. Our work reinforces the notion that different functions have different units of operation and should be treated as such in function prediction pipelines. We extend this work by developing a deep learning approach to incorporate our dense and site-specific feature representation at the residue-level, without the need to first pre-partition sequences into discrete regions, using convolutional neural network models. We show that this approach can automatically localize function labels to regions and sites by applying visualization techniques used in image analysis to highlight functionally important residues and validating them across binding and catalytic functions associated with ligand-binding sites extracted from protein-ligand structures

    Analysis of the genome sequence of the flowering plant Arabidopsis thaliana

    No full text
    The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans - the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement

    Critical care admission following elective surgery was not associated with survival benefit: prospective analysis of data from 27 countries

    Get PDF
    This was an investigator initiated study funded by Nestle Health Sciences through an unrestricted research grant, and by a National Institute for Health Research (UK) Professorship held by RP. The study was sponsored by Queen Mary University of London

    The surgical safety checklist and patient outcomes after surgery: a prospective observational cohort study, systematic review and meta-analysis

    Get PDF
    © 2017 British Journal of Anaesthesia Background: The surgical safety checklist is widely used to improve the quality of perioperative care. However, clinicians continue to debate the clinical effectiveness of this tool. Methods: Prospective analysis of data from the International Surgical Outcomes Study (ISOS), an international observational study of elective in-patient surgery, accompanied by a systematic review and meta-analysis of published literature. The exposure was surgical safety checklist use. The primary outcome was in-hospital mortality and the secondary outcome was postoperative complications. In the ISOS cohort, a multivariable multi-level generalized linear model was used to test associations. To further contextualise these findings, we included the results from the ISOS cohort in a meta-analysis. Results are reported as odds ratios (OR) with 95% confidence intervals. Results: We included 44 814 patients from 497 hospitals in 27 countries in the ISOS analysis. There were 40 245 (89.8%) patients exposed to the checklist, whilst 7508 (16.8%) sustained ≥1 postoperative complications and 207 (0.5%) died before hospital discharge. Checklist exposure was associated with reduced mortality [odds ratio (OR) 0.49 (0.32–0.77); P\u3c0.01], but no difference in complication rates [OR 1.02 (0.88–1.19); P=0.75]. In a systematic review, we screened 3732 records and identified 11 eligible studies of 453 292 patients including the ISOS cohort. Checklist exposure was associated with both reduced postoperative mortality [OR 0.75 (0.62–0.92); P\u3c0.01; I2=87%] and reduced complication rates [OR 0.73 (0.61–0.88); P\u3c0.01; I2=89%). Conclusions: Patients exposed to a surgical safety checklist experience better postoperative outcomes, but this could simply reflect wider quality of care in hospitals where checklist use is routine
    corecore