43 research outputs found

    MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins

    Get PDF
    MobiDB (ext-link-type="uri" xlink:href="http://mobidb.bio.unipd.it/" xlink:type="simple">http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein-protein interactions from STRING are also classified for disorder content

    D2P2: database of disordered protein predictions

    Get PDF
    We present the Database of Disordered Protein Prediction (D2P2), available at http://d2p2.pro (including website source code). A battery of disorder predictors and their variants, VL-XT, VSL2b, PrDOS, PV2, Espritz and IUPred, were run on all protein sequences from 1765 complete proteomes (to be updated as more genomes are completed). Integrated with these results are all of the predicted (mostly structured) SCOP domains using the SUPERFAMILY predictor. These disorder/structure annotations together enable comparison of the disorder predictors with each other and examination of the overlap between disordered predictions and SCOP domains on a large scale. D2P2 will increase our understanding of the interplay between disorder and structure, the genomic distribution of disorder, and its evolutionary history. The parsed data are made available in a unified format for download as flat files or SQL tables either by genome, by predictor, or for the complete set. An interactive website provides a graphical view of each protein annotated with the SCOP domains and disordered regions from all predictors overlaid (or shown as a consensus). There are statistics and tools for browsing and comparing genomes and their disorder within the context of their position on the tree of life. © The Author(s) 2012. Published by Oxford University Press

    InterPro in 2017-beyond protein family and domain annotations

    Get PDF
    InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences

    InterPro in 2017-beyond protein family and domain annotations

    Get PDF
    InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences

    Best practices for the manual curation of Intrinsically Disordered Proteins in DisProt

    Full text link
    The DisProt database is a significant resource containing manually curated data on experimentally validated intrinsically disordered proteins (IDPs) and regions (IDRs) from the literature. Developed in 2005, its primary goal was to collect structural and functional information into proteins that lack a fixed three-dimensional (3D) structure. Today, DisProt has evolved into a major repository that not only collects experimental data but also contributes significantly to our understanding of the IDPs/IDRs roles in various biological processes, such as autophagy or the life cycle mechanisms in viruses, or their involvement in diseases (such as cancer and neurodevelopmental disorders). DisProt offers detailed information on the structural states of IDPs/IDRs, including state transitions, interactions, and their functions, all provided as curated annotations. One of the central activities of DisProt is the meticulous curation of experimental data from the literature. For this reason, to ensure that every expert and volunteer curator possesses the requisite knowledge for data evaluation, collection, and integration, training courses and curation materials are available. However, biocuration guidelines concur on the importance of developing robust guidelines that not only provide critical information about data consistency but also ensure data acquisition.This guideline aims to provide both biocurators and external users with best practices for manually curating IDPs and IDRs in DisProt. It describes every step of the literature curation process and provides use cases of IDP curation within DisProt. Database URL: https://disprot.org

    Computational characterization of tandem repeat and non-globular proteins

    Get PDF
    The first protein structure to be determined was hemoglobin, a globe-like, water-soluble protein with enzymatic activity. Since then, protein science has been biased towards this type, termed globular. However, over the last decades accumulating experimental evidences suggested the functional importance of their counterpart, non-globular proteins (NGPs). The definition includes tandem repetitions, intrinsically disordered regions, aggregating domains and transmembrane domains. NGPs recognition and classification is essential to shed a light on the so called “dark proteome”, i.e. the large fraction that we know almost nothing about. I contributed to this goal through the development of new resources dedicated to NGPs. My main focus are tandem repeat proteins (TRPs). TRPs are characterized by a repeated sequence which folds into a modular architecture, where modules are called “units”. The unit represents not only the structural but also the evolutionary module and base TRPs classification. TRPs are widespread in all type of organisms, where they carry out fundamental functions. The sequences of TRP units diverge quickly while maintaining their fold, hampering detection by traditional methods for sequence analysis. Conversely, the challenges of structure-based repeats detection lie in the multidimensional nature of the data. Specialized methods have been developed for TRPs identification, however few of them annotate single repeat units. RepeatsDB is a database of TRP structures annotated with the position of repeat units and insertions. I contributed to the new version of RepeatsDB database, which was populated taking advantage of ReUPred, predictor of tandem repeat units. The quality of RepeatsDB data is guaranteed by manual validation, a time-consuming task which requires community annotation efforts. To facilitate this process I developed RepeatsDB-lite, web server for the prediction and refinement of tandem repeats in protein structure. Analysing RepeatsDB data, I compared the sequence- and structure-based classification of TRPs. Moreover, I provided insights on TRPs role in the human proteome by characterizing them in terms of function, protein-protein interaction networks and impact on diseases. As a case study, I characterized Collagen V, a repeat protein associated to Ehlers-Danlos syndrome, identifying genotype-phenotype correlations in relation to its interaction network model. Another category of NGPs is intrinsically disordered proteins (IDPs), devoid of order in their native state. Intrinsic disorder was shown to be prevalent in the human proteome, to play important signaling and regulatory roles and to be frequently involved in disease. I contributed to MobiDB, database of protein disorder and mobility annotations that describes several aspects of NGPs structure and mechanism of function. MobiDB provides consensus predictions and functional annotations for all known protein sequences. A common feature of TRPs, IDPs and other NGPs is that they are characterized by low-complexity regions, where the distribution of amino acids deviates from the common amino acid usage. The functional importance of low complexity regions is strictly related to their non-globular arrangement. I contributed to the field with a critical review focusing on the definition of sequence features of low complexity regions and their relationship to structural features. Finally, I exploited the knowledge acquired on NGPs in the previous studies to design one of the first sequence-based methods for the prediction of protein solubility, SODA. SODA uses the aggregation propensity, intrinsic disorder, hydrophobicity and secondary structure preferences from a sequence to evaluate solubility changes introduced by a mutation. The main envisaged applications of SODA are in protein engineering and in the study of the impact of protein mutations in disease insurgence

    An intrinsically disordered proteins community for ELIXIR.

    Get PDF
    Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders

    Bioinformatics in Italy : BITS 2012, the ninth annual meeting of the Italian Society of Bioinformatics

    Get PDF
    The BITS2012 meeting, held in Catania on May 2-4, 2012, brought together almost 100 Italian researchers working in the field of Bioinformatics, as well as students in the same or related disciplines. About 90 original research works were presented either as oral communication or as posters, representing a landscape of Italian current research in bioinformatics.This preface provides a brief overview of the meeting and introduces the manuscripts that were accepted for publication in this supplement, after a strict and careful peer-review by an International board of referees

    Computational Investigations of Backbone Dynamics in Intrinsically Disordered Proteins

    Get PDF
    Intrinsically disordered proteins (IDPs), due to their dynamic nature, play important roles in molecular recognition, signalling, regulation, or binding of nucleic acids. IDPs have been extensively studied computationally in terms of binary disorder/order classification. This approach has proven to be fruitful and enabled researchers to estimate the amount of disorder in prokaryotic and eukaryotic genomes. Other computational methods – molecular dynamics, or other simulation techniques, require a starting structure. However, there are no approaches permitting insight into the behaviour of disordered ensembles from sequence alone. Such a method would facilitate the study of proteins of unknown structures, help to obtain a better classification of the disordered regions, and the design disorder-to-order transitions. In this work, I develop FRAGFOLD-IDP, a method to address this issue. Using a fragment-based structure prediction approach – FRAGFOLD, I generate the ensembles of IDPs and show that the features extracted from them correspond well with the backbone dynamics of NMR ensembles deposited in the PDB. FRAGFOLD-IDP predictions significantly improve over a naïve approach and help to get a better insight into the dynamics of the disordered ensembles. The results also show it is not necessary to predict the correct fold of the protein to reliably assign per-residue fluctuations to the sequence in question. This suggests that disorder is a local property and it does not depend on the protein fold. Next, I validate FRAGFOLD-IDP on the disorder classification task and show that the method performs comparably to machine learning-based approaches designed specifically for this task. I also found that FRAGFOLD-IDP produces results on par with DynaMine, a machine learning approach to predict the NMR order parameters and that the results of both methods are not correlated. Thus, I constructed a consensus neural network predictor, which takes the results of FRAGFOLD-IDP, DynaMine and physicochemical features to predict per-residue fluctuations, improving upon both input methods
    corecore