31 research outputs found

    DisProt: intrinsic protein disorder annotation in 2020

    Get PDF
    The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome

    Caractérisation des périodes de sécheresse sur le domaine de l'Afrique simulée par le ModÚle Régional Canadien du Climat (MRCC5)

    Get PDF
    Les consĂ©quences des changements climatiques sur la frĂ©quence ainsi que sur l'intensitĂ© des prĂ©cipitations auront un impact direct sur les pĂ©riodes de sĂ©cheresse et par consĂ©quent sur diffĂ©rents secteurs Ă©conomiques tels que le secteur de l'agriculture. Ainsi, dans cette Ă©tude, l'habilitĂ© du ModĂšle RĂ©gional Canadien du Climat (MRCC5) Ă  simuler les diffĂ©rentes caractĂ©ristiques des pĂ©riodes de sĂ©cheresse est Ă©valuĂ©e pour 4 seuils de prĂ©cipitation soit 0.5 mm, 1 mm, 2 mm et 3 mm. Ces caractĂ©ristiques incluent le nombre de jours secs, le nombre de pĂ©riodes de sĂ©cheresse ainsi que le maximum de jours consĂ©cutifs sans prĂ©cipitation associĂ© Ă  une rĂ©currence de 5 ans. Les rĂ©sultats sont prĂ©sentĂ©s pour des moyennes annuelles et saisonniĂšres. L'erreur de performance est Ă©valuĂ©e en comparant le MRCC5 pilotĂ© par ERA-Interim aux donnĂ©es d'analyses du GPCP pour le climat prĂ©sent (1997-2008). L'erreur due aux conditions aux frontiĂšres c'est-Ă -dire les erreurs de pilotage du MRCC5, soit par CanESM2 et par ERA-Interim ainsi que l'Ă©valuation de la valeur ajoutĂ©e du MRCC5 face au CanESM2 sont Ă©galement analysĂ©es. L'analyse de ces caractĂ©ristiques est Ă©galement faite dans un contexte de climat changeant pour deux pĂ©riodes futures, soit 2041-2070 et 2071-2100 Ă  l'aide du MRCC5 pilotĂ© par le modĂšle de circulation gĂ©nĂ©rale CanESM2 de mĂȘme que par le modĂšle CanESM2 sous le scĂ©nario RCP 4.5. Les rĂ©sultats suggĂšrent que le MRCC5 pilotĂ© par ERA-Interim a tendance Ă  surestimer la moyenne annuelle du nombre de jours secs ainsi que le maximum de jours consĂ©cutifs sans prĂ©cipitation associĂ© Ă  une rĂ©currence de 5 ans dans la plupart des rĂ©gions de l'Afrique et une tendance Ă  sous-estimer le nombre de pĂ©riodes de sĂ©cheresse. En gĂ©nĂ©ral, l'erreur de performance est plus importante que l'erreur due aux conditions aux frontiĂšres pour les diffĂ©rentes caractĂ©ristiques de pĂ©riodes de sĂ©cheresse. Pour les rĂ©gions Ă©quatoriales, les changements apprĂ©hendĂ©s par le MRCC5 pilotĂ© par CanESM2 pour les diffĂ©rentes caractĂ©ristiques de pĂ©riodes de sĂ©cheresse et pour deux pĂ©riodes futures (2041-2070 et 2071-2100), suggĂšrent une augmentation significatives du nombre de jours secs ainsi que du maximum de jours consĂ©cutifs sans prĂ©cipitation associĂ© Ă  une rĂ©currence de 5 ans. Une diminution significative du nombre de pĂ©riodes de sĂ©cheresse est aussi prĂ©vue.\ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : ModĂšle RĂ©gional du Climat, Changement climatique, Jours secs, Nombre de pĂ©riodes de sĂ©cheresse, ÉvĂ©nement de faible rĂ©currence, Afriqu

    Critical assessment of protein intrinsic disorder prediction

    Get PDF
    Abstract: Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude

    APICURON: A database to credit and acknowledge the work of biocurators

    Get PDF
    APICURON is an open and freely accessible resource that tracks and credits the work of biocurators across multiple participating knowledgebases. Biocuration is essential to extract knowledge from research data and make it available in a structured and standardized way to the scientific community. However, processing biological data - mainly from literature - requires a huge effort that is difficult to attribute and quantify. APICURON collects biocuration events from third-party resources and aggregates this information, spotlighting biocurator contributions. APICURON promotes biocurator engagement implementing gamification concepts like badges, medals and leaderboards and at the same time provides a monitoring service for registered resources and for biocurators themselves. APICURON adopts a data model that is flexible enough to represent and track the majority of biocuration activities. Biocurators are identified through their Open Researcher and Contributor ID. The definition of curation events, scoring systems and rules for assigning badges and medals are resource-specific and easily customizable. Registered resources can transfer curation activities on the fly through a secure and robust Application Programming Interface (API). Here, we show how simple and effective it is to connect a resource to APICURON, describing the DisProt database of intrinsically disordered proteins as a use case. We believe APICURON will provide biological knowledgebases with a service to recognize and credit the effort of their biocurators, monitor their activity and promote curator engagement. Database URL: https://apicuron.or

    FuzPred: a web server for the sequence-based prediction of the context-dependent binding modes of proteins.

    No full text
    Proteins form complex interactions in the cellular environment to carry out their functions. They exhibit a wide range of binding modes depending on the cellular conditions, which result in a variety of ordered or disordered assemblies. To help rationalise the binding behavior of proteins, the FuzPred server predicts their sequence-based binding modes without specifying their binding partners. The binding mode defines whether the bound state is formed through a disorder-to-order transition resulting in a well-defined conformation, or through a disorder-to-disorder transition where the binding partners remain conformationally heterogeneous. To account for the context-dependent nature of the binding modes, the FuzPred method also estimates the multiplicity of binding modes, the likelihood of sampling multiple binding modes. Protein regions with a high multiplicity of binding modes may serve as regulatory sites or hot-spots for structural transitions in the assembly. To facilitate the interpretation of the predictions, protein regions with different interaction behaviors can be visualised on protein structures generated by AlphaFold. The FuzPred web server (https://fuzpred.bio.unipd.it) thus offers insights into the structural and dynamical changes of proteins upon interactions and contributes to development of structure-function relationships under a variety of cellular conditions

    Assessing predictors for new post translational modification sites: A case study on hydroxylation

    No full text
    Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a guide for effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance may often not be indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance is found to widely overestimate the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models do not sufficiently generalize to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. As hydroxylation site predictors do not generalize for new data, caution is advised when using PTM predictors in the absence of independent evaluations, in particular for highly specific sites involved in signalling
    corecore