14 research outputs found

    Prediction of Protein Binding Regions in Disordered Proteins

    Get PDF
    Many disordered proteins function via binding to a structured partner and undergo a disorder-to-order transition. The coupled folding and binding can confer several functional advantages such as the precise control of binding specificity without increased affinity. Additionally, the inherent flexibility allows the binding site to adopt various conformations and to bind to multiple partners. These features explain the prevalence of such binding elements in signaling and regulatory processes. In this work, we report ANCHOR, a method for the prediction of disordered binding regions. ANCHOR relies on the pairwise energy estimation approach that is the basis of IUPred, a previous general disorder prediction method. In order to predict disordered binding regions, we seek to identify segments that are in disordered regions, cannot form enough favorable intrachain interactions to fold on their own, and are likely to gain stabilizing energy by interacting with a globular protein partner. The performance of ANCHOR was found to be largely independent from the amino acid composition and adopted secondary structure. Longer binding sites generally were predicted to be segmented, in agreement with available experimentally characterized examples. Scanning several hundred proteomes showed that the occurrence of disordered binding sites increased with the complexity of the organisms even compared to disordered regions in general. Furthermore, the length distribution of binding sites was different from disordered protein regions in general and was dominated by shorter segments. These results underline the importance of disordered proteins and protein segments in establishing new binding regions. Due to their specific biophysical properties, disordered binding sites generally carry a robust sequence signal, and this signal is efficiently captured by our method. Through its generality, ANCHOR opens new ways to study the essential functional sites of disordered proteins

    InterPro in 2017-beyond protein family and domain annotations

    Get PDF
    InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences

    An intrinsically disordered proteins community for ELIXIR.

    Get PDF
    Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders

    Caractérisation des périodes de sécheresse sur le domaine de l'Afrique simulée par le Modèle Régional Canadien du Climat (MRCC5)

    Get PDF
    Les conséquences des changements climatiques sur la fréquence ainsi que sur l'intensité des précipitations auront un impact direct sur les périodes de sécheresse et par conséquent sur différents secteurs économiques tels que le secteur de l'agriculture. Ainsi, dans cette étude, l'habilité du Modèle Régional Canadien du Climat (MRCC5) à simuler les différentes caractéristiques des périodes de sécheresse est évaluée pour 4 seuils de précipitation soit 0.5 mm, 1 mm, 2 mm et 3 mm. Ces caractéristiques incluent le nombre de jours secs, le nombre de périodes de sécheresse ainsi que le maximum de jours consécutifs sans précipitation associé à une récurrence de 5 ans. Les résultats sont présentés pour des moyennes annuelles et saisonnières. L'erreur de performance est évaluée en comparant le MRCC5 piloté par ERA-Interim aux données d'analyses du GPCP pour le climat présent (1997-2008). L'erreur due aux conditions aux frontières c'est-à-dire les erreurs de pilotage du MRCC5, soit par CanESM2 et par ERA-Interim ainsi que l'évaluation de la valeur ajoutée du MRCC5 face au CanESM2 sont également analysées. L'analyse de ces caractéristiques est également faite dans un contexte de climat changeant pour deux périodes futures, soit 2041-2070 et 2071-2100 à l'aide du MRCC5 piloté par le modèle de circulation générale CanESM2 de même que par le modèle CanESM2 sous le scénario RCP 4.5. Les résultats suggèrent que le MRCC5 piloté par ERA-Interim a tendance à surestimer la moyenne annuelle du nombre de jours secs ainsi que le maximum de jours consécutifs sans précipitation associé à une récurrence de 5 ans dans la plupart des régions de l'Afrique et une tendance à sous-estimer le nombre de périodes de sécheresse. En général, l'erreur de performance est plus importante que l'erreur due aux conditions aux frontières pour les différentes caractéristiques de périodes de sécheresse. Pour les régions équatoriales, les changements appréhendés par le MRCC5 piloté par CanESM2 pour les différentes caractéristiques de périodes de sécheresse et pour deux périodes futures (2041-2070 et 2071-2100), suggèrent une augmentation significatives du nombre de jours secs ainsi que du maximum de jours consécutifs sans précipitation associé à une récurrence de 5 ans. Une diminution significative du nombre de périodes de sécheresse est aussi prévue.\ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : Modèle Régional du Climat, Changement climatique, Jours secs, Nombre de périodes de sécheresse, Événement de faible récurrence, Afriqu

    Pipeline for transferring annotations between proteins beyond globular domains

    No full text
    Background DisProt is the primary repository of Intrinsically Disordered Proteins (IDPs). This database is manually curated and the annotations there have strong experimental support. Currently, DisProt contains a relatively small number of proteins highlighting the importance of transferring annotations regarding verified disorder state and corresponding functions to homologous proteins in other species. In such a way, providing them with highly valuable information to better understand their biological roles. While the principles and practicalities of homology transfer are well-established for globular proteins, these are largely lacking for disordered proteins. Methods We used DisProt to evaluate the transferability of the annotation terms to orthologous proteins. For each protein, we looked for their orthologs, with the assumption that they will have a similar function. Then, for each protein and their orthologs we made multiple sequence alignments (MSAs). Disordered sequences are fast evolving and can be hard to align: Therefore we implemented alignment quality control steps ensuring robust alignments before mapping the annotations. Results We have designed a pipeline to obtain good quality MSAs and to transfer annotations from any protein to their orthologs. Applying the pipeline to DisProt proteins, from the 1,731 entries with 5,623 annotations we can reach 97,555 orthologs and transfer a total of 301,190 terms by homology. We also provide a web server for consulting the results of DisProt proteins and execute the pipeline for any other protein. The server Homology Transfer IDP (HoTIDP) is accessible at http://hotidp.leloir.org.ar.Fil: Martinez Perez, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; ArgentinaFil: Pajkos, Mátyás. Eötvös University; ArgentinaFil: Tosatto, Silvio C. E.. Università di Padova; ItaliaFil: Gibson, Toby James. European Molecular Biology Laboratory Heidelberg; AlemaniaFil: Dosztanyi, Zsuzsanna. Eötvös University; ArgentinaFil: Marino, Cristina Ester. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; Argentin

    (DP2)-P-2: database of disordered protein predictions

    Get PDF
    We present the Database of Disordered Protein Prediction (D(2)P(2)), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D(2)P(2) will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life

    Integron-associated mobile gene cassettes code for folded proteins: the structure of Bal32a, a new member of the adaptable α+β barrel family

    No full text
    The wide-ranging physiology and large genetic variability observed for prokaryotes is largely attributed, not to the prokaryotic genome itself, but rather to mechanisms of lateral gene transfer. Cassette PCR has been used to sample the integron/gene cassette metagenome from different natural environments without laboratory cultivation of the host organism, and without prior knowledge of any target protein sequence. Since over 90% of cassette genes are unrelated to any sequence in the current databases, it is not clear whether these genes code for folded functional proteins. We have selected a sample of eight cassette-encoded genes with no known homologs; five have been isolated as soluble protein products and shown by biophysical techniques to be folded. In solution, at least three of these proteins organise as stable oligomeric assemblies. The tertiary structure of one of these, Bal32a derived from a contaminated soil site, has been solved by X-ray crystallography to 1.8 Å resolution. From the three-dimensional structure, Bal32a is found to be a member of the highly adaptable α+β barrel family of transport proteins and enzymes. In Bal32a, the barrel cavity is unusually deep and inaccessible to solvent. Polar side-chains in its interior are reminiscent of catalytic sites of limonene-1,2-epoxide hydrolase and nogalonic acid methyl ester cyclase. These studies demonstrate the viability of direct sampling of mobile DNA as a route for the discovery of novel proteins

    Disentangling the complexity of low complexity proteins

    Get PDF
    There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs
    corecore