27 research outputs found

    In-Depth Performance Evaluation of PFP and ESG Sequence-Based Function Prediction Methods in CAFA 2011 Experiment

    Get PDF
    Background Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. Results We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. Conclusion The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences

    Emergence of Members of TRAF and DUB of Ubiquitin Proteasome System in the Regulation of Hypertrophic Cardiomyopathy

    Get PDF
    The ubiquitin proteasome system (UPS) plays an imperative role in many critical cellular processes, frequently by mediating the selective degradation of misfolded and damaged proteins and also by playing a non-degradative role especially important as in many signaling pathways. Over the last three decades, accumulated evidence indicated that UPS proteins are primal modulators of cell cycle progression, DNA replication, and repair, transcription, immune responses, and apoptosis. Comparatively, latest studies have demonstrated a substantial complexity by the UPS regulation in the heart. In addition, various UPS proteins especially ubiquitin ligases and proteasome have been identified to play a significant role in the cardiac development and dynamic physiology of cardiac pathologies such as ischemia/reperfusion injury, hypertrophy, and heart failure. However, our understanding of the contribution of UPS dysfunction in the plausible development of cardiac pathophysiology and the complete list of UPS proteins regulating these afflictions is still in infancy. The recent emergence of the roles of TNF receptor-associated factor (TRAFs) and deubiquitinating enzymes (DUBs) superfamily in hypertrophic cardiomyopathy has enhanced our knowledge. In this review, we have mainly compiled the TRAF superfamily of E3 ligases and few DUBs proteins with other well-documented E3 ligases such as MDM2, MuRF-1, Atrogin-I, and TRIM 32 that are specific to myocardial hypertrophy. In this review, we also aim to highlight their expression profile following physiological and pathological stimulation leading to the onset of hypertrophic phenotype in the heart that can serve as biomarkers and the opportunity for the development of novel therapies

    An expanded evaluation of protein function prediction methods shows an improvement in accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent. Keywords: Protein function prediction, Disease gene prioritizationpublishedVersio

    An Expanded Evaluation of Protein Function Prediction Methods Shows an Improvement In Accuracy

    Get PDF
    Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent

    Global incidence, prevalence, years lived with disability (YLDs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Detailed, comprehensive, and timely reporting on population health by underlying causes of disability and premature death is crucial to understanding and responding to complex patterns of disease and injury burden over time and across age groups, sexes, and locations. The availability of disease burden estimates can promote evidence-based interventions that enable public health researchers, policy makers, and other professionals to implement strategies that can mitigate diseases. It can also facilitate more rigorous monitoring of progress towards national and international health targets, such as the Sustainable Development Goals. For three decades, the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) has filled that need. A global network of collaborators contributed to the production of GBD 2021 by providing, reviewing, and analysing all available data. GBD estimates are updated routinely with additional data and refined analytical methods. GBD 2021 presents, for the first time, estimates of health loss due to the COVID-19 pandemic. Methods: The GBD 2021 disease and injury burden analysis estimated years lived with disability (YLDs), years of life lost (YLLs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries using 100 983 data sources. Data were extracted from vital registration systems, verbal autopsies, censuses, household surveys, disease-specific registries, health service contact data, and other sources. YLDs were calculated by multiplying cause-age-sex-location-year-specific prevalence of sequelae by their respective disability weights, for each disease and injury. YLLs were calculated by multiplying cause-age-sex-location-year-specific deaths by the standard life expectancy at the age that death occurred. DALYs were calculated by summing YLDs and YLLs. HALE estimates were produced using YLDs per capita and age-specific mortality rates by location, age, sex, year, and cause. 95% uncertainty intervals (UIs) were generated for all final estimates as the 2·5th and 97·5th percentiles values of 500 draws. Uncertainty was propagated at each step of the estimation process. Counts and age-standardised rates were calculated globally, for seven super-regions, 21 regions, 204 countries and territories (including 21 countries with subnational locations), and 811 subnational locations, from 1990 to 2021. Here we report data for 2010 to 2021 to highlight trends in disease burden over the past decade and through the first 2 years of the COVID-19 pandemic. Findings: Global DALYs increased from 2·63 billion (95% UI 2·44–2·85) in 2010 to 2·88 billion (2·64–3·15) in 2021 for all causes combined. Much of this increase in the number of DALYs was due to population growth and ageing, as indicated by a decrease in global age-standardised all-cause DALY rates of 14·2% (95% UI 10·7–17·3) between 2010 and 2019. Notably, however, this decrease in rates reversed during the first 2 years of the COVID-19 pandemic, with increases in global age-standardised all-cause DALY rates since 2019 of 4·1% (1·8–6·3) in 2020 and 7·2% (4·7–10·0) in 2021. In 2021, COVID-19 was the leading cause of DALYs globally (212·0 million [198·0–234·5] DALYs), followed by ischaemic heart disease (188·3 million [176·7–198·3]), neonatal disorders (186·3 million [162·3–214·9]), and stroke (160·4 million [148·0–171·7]). However, notable health gains were seen among other leading communicable, maternal, neonatal, and nutritional (CMNN) diseases. Globally between 2010 and 2021, the age-standardised DALY rates for HIV/AIDS decreased by 47·8% (43·3–51·7) and for diarrhoeal diseases decreased by 47·0% (39·9–52·9). Non-communicable diseases contributed 1·73 billion (95% UI 1·54–1·94) DALYs in 2021, with a decrease in age-standardised DALY rates since 2010 of 6·4% (95% UI 3·5–9·5). Between 2010 and 2021, among the 25 leading Level 3 causes, age-standardised DALY rates increased most substantially for anxiety disorders (16·7% [14·0–19·8]), depressive disorders (16·4% [11·9–21·3]), and diabetes (14·0% [10·0–17·4]). Age-standardised DALY rates due to injuries decreased globally by 24·0% (20·7–27·2) between 2010 and 2021, although improvements were not uniform across locations, ages, and sexes. Globally, HALE at birth improved slightly, from 61·3 years (58·6–63·6) in 2010 to 62·2 years (59·4–64·7) in 2021. However, despite this overall increase, HALE decreased by 2·2% (1·6–2·9) between 2019 and 2021. Interpretation: Putting the COVID-19 pandemic in the context of a mutually exclusive and collectively exhaustive list of causes of health loss is crucial to understanding its impact and ensuring that health funding and policy address needs at both local and global levels through cost-effective and evidence-based interventions. A global epidemiological transition remains underway. Our findings suggest that prioritising non-communicable disease prevention and treatment policies, as well as strengthening health systems, continues to be crucially important. The progress on reducing the burden of CMNN diseases must not stall; although global trends are improving, the burden of CMNN diseases remains unacceptably high. Evidence-based interventions will help save the lives of young children and mothers and improve the overall health and economic conditions of societies across the world. Governments and multilateral organisations should prioritise pandemic preparedness planning alongside efforts to reduce the burden of diseases and injuries that will strain resources in the coming decades. Funding: Bill & Melinda Gates Foundation

    Protein function, diversity and functional interplay

    No full text
    Functional annotations of novel or unknown proteins is one of the central problems in post-genomics bioinformatics research. With the vast expansion of genomic and proteomic data and technologies over the last decade, development of automated function prediction (AFP) methods for large-scale identification of protein function has become imperative in many aspects. In this research, we address two important divergences from the “one protein – one function” concept on which all existing AFP methods are developed: 1. One protein with multiple independent functions – Moonlighting Proteins: Moonlighting proteins perform more than one independent cellular function within one polypeptide chain. Recent biological experiments have been discovering such multi-functional proteins at a steady pace. Our work on moonlighting proteins can be divided into two logical parts: 1a. Development of a computational framework for comprehensive genome-scale characterization of moonlighting proteins based on functional and context-based information. Our work identifies characteristic features of moonlighting proteins in both cases where current databases have functional annotations of the diverse functions of such proteins and cases where functional annotations do not exist. 1b. Development of automated prediction models of moonlighting proteins. We take two different approaches for our model development: using functional and context based features in a machine learning framework, and using text-based features, learned through text-mining algorithms. 2. Group of proteins sharing a common function: On a regular basis, biological experiments reveal sets of proteins involved in disease/disorder/cellular phenomena without sufficient explanation of the functional mechanisms of these group activities. Intuitively, proteins interact in a cell physically, through gene expression or genetic interaction to perform a common function that so often ends up causing a disease/disorder. To understand the functional nature of a set of proteins, it is often important to understand the functionalities in which they are involved in as a group, rather than understanding the detailed functional characteristics of the individual proteins. In this research, we develop a conditional random field (CRF)-based framework that predicts the function of the “protein groups”, based on group neighborhood of their interaction network, and iteratively updates the function annotation of the unknown group members such that it reflects the protein’s group activity. For the protein function prediction research domain, it is vital to keep pace with existing AFP methods by improving the prediction accuracy, updating the models and making the methods available to the bioinformatics community. The final part of this research copes with the AFP problem in three aspects: improvement, database update and web-server development of two existing methods: PFP and ESG, and participation in a community-wide challenge for the AFP methods called CAFA (Critical Assessment of Function Annotation) and bench-marking the performances

    Missing gene identification using functional coherence scores

    Get PDF
    Reconstructing metabolic and signaling pathways is an effective way of interpreting a genome sequence. A challenge in a pathway reconstruction is that often genes in a pathway cannot be easily found, reflecting current imperfect information of the target organism. In this work, we developed a new method for finding missing genes, which integrates multiple features, including gene expression, phylogenetic profile, and function association scores. Particularly, for considering function association between candidate genes and neighboring proteins to the target missing gene in the network, we used Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), which are designed for capturing functional coherence of proteins. We showed that adding CAS and PAS substantially improve the accuracy of identifying missing genes in the yeast enzyme-enzyme network compared to the cases when only the conventional features, gene expression, phylogenetic profile, were used. Finally, it was also demonstrated that the accuracy improves by considering indirect neighbors to the target enzyme position in the network using a proper network-topology-based weighting scheme

    Evaluation of function predictions by PFP, ESG, and PSI-BLAST for moonlighting proteins

    Get PDF
    Background Advancements in function prediction algorithms are enabling large scale computational annotation for newly sequenced genomes. With the increase in the number of functionally well characterized proteins it has been observed that there are many proteins involved in more than one function. These proteins characterized as moonlighting proteins show varied functional behavior depending on the cell type, localization in the cell, oligomerization, multiple binding sites, etc. The functional diversity shown by moonlighting proteins may have significant impact on the traditional sequence based function prediction methods. Here we investigate how well diverse functions of moonlighting proteins can be predicted by some existing function prediction methods. Results We have analyzed the performances of three major sequence based function prediction methods, PSI-BLAST, the Protein Function Prediction (PFP), and the Extended Similarity Group (ESG) on predicting diverse functions of moonlighting proteins. In predicting discrete functions of a set of 19 experimentally identified moonlighting proteins, PFP showed overall highest recall among the three methods. Although ESG showed the highest precision, its recall was lower than PSI-BLAST. Recall by PSI-BLAST greatly improved when BLOSUM45 was used instead of BLOSUM62. Conclusion We have analyzed the performances of PFP, ESG, and PSI-BLAST in predicting the functional diversity of moonlighting proteins. PFP shows overall better performance in predicting diverse moonlighting functions as compared with PSI-BLAST and ESG. Recall by PSI-BLAST greatly improved when BLOSUM45 was used. This analysis indicates that considering weakly similar sequences in prediction enhances the performance of sequence based AFP methods in predicting functional diversity of moonlighting proteins. The current study will also motivate development of novel computational frameworks for automatic identification of such proteins

    Delineating Crosstalk Mechanisms of the Ubiquitin Proteasome System That Regulate Apoptosis

    No full text
    Regulatory functions of the ubiquitin-proteasome system (UPS) are exercised mainly by the ubiquitin ligases and deubiquitinating enzymes. Degradation of apoptotic proteins by UPS is central to the maintenance of cell health, and deregulation of this process is associated with several diseases including tumors, neurodegenerative disorders, diabetes, and inflammation. Therefore, it is the view that interrogating protein turnover in cells can offer a strategy for delineating disease-causing mechanistic perturbations and facilitate identification of drug targets. In this review, we are summarizing an overview to elucidate the updated knowledge on the molecular interplay between the apoptosis and UPS pathways. We have condensed around 100 enzymes of UPS machinery from the literature that ubiquitinates or deubiquitinates the apoptotic proteins and regulates the cell fate. We have also provided a detailed insight into how the UPS proteins are able to fine-tune the intrinsic, extrinsic, and p53-mediated apoptotic pathways to regulate cell survival or cell death. This review provides a comprehensive overview of the potential of UPS players as a drug target for cancer and other human disorders
    corecore