43 research outputs found

    Accurate single-sequence prediction of solvent accessible surface area using local and global features

    Get PDF
    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org

    Comparing NMR and X-ray protein structure: Lindemann-like parameters and NMR disorder

    Get PDF
    Disordered protein chains and segments are fast becoming a major pathway for our understanding of biological function, especially in more evolved species. However, the standard definition of disordered residues: the inability to constrain them in X-ray derived structures, is not easily applied to NMR derived structures. We carry out a statistical comparison between proteins whose structure was resolved using NMR and using X-ray protocols. We start by establishing a connection between these two protocols for obtaining protein structure. We find a close statistical correspondence between NMR and X-ray structures if fluctuations inherent to the NMR protocol are taken into account. Intuitively this tends to lend support to the validity of both NMR and X-ray protocols in deriving biomolecular models that correspond to in vivo conditions. We then establish Lindemann-like parameters for NMR derived structures and examine what order/disorder cutoffs for these parameters are most consistent with X-ray data and how consistent are they. Finally, we find critical value of for the best correspondence between X-ray and NMR derived order/disorder assignment, judged by maximizing the Matthews correlation, and a critical value if a balance between false positive and false negative prediction is sought. We examine a few non-conforming cases, and examine the origin of the structure derived in X-ray. This study could help in assigning meaningful disorder from NMR experiments

    Entropy, Fluctuations, and Disordered Proteins

    Get PDF
    Entropy should directly reflect the extent of disorder in proteins. By clustering structurally related proteins and studying the multiple-sequence-alignment of the sequences of these clusters, we were able to link between sequence, structure, and disorder information. We introduced several parameters as measures of fluctuations at a given MSA site and used these as representative of the sequence and structure entropy at that site. In general, we found a tendency for negative correlations between disorder and structure, and significant positive correlations between disorder and the fluctuations in the system. We also found evidence for residue-type conservation for those residues proximate to potentially disordered sites. Mutation at the disorder site itself appear to be allowed. In addition, we found positive correlation for disorder and accessible surface area, validating that disordered residues occur in exposed regions of proteins. Finally, we also found that fluctuations in the dihedral angles at the original mutated residue and disorder are positively correlated while dihedral angle fluctuations in spatially proximal residues are negatively correlated with disorder. Our results seem to indicate permissible variability in the disordered site, but greater rigidity in the parts of the protein with which the disordered site interacts. This is another indication that disordered residues are involved in protein function

    Many-to-one binding by intrinsically disordered protein regions

    Get PDF
    Disordered binding regions (DBRs), which are embedded within intrinsically disordered proteins or regions (IDPs or IDRs), enable IDPs or IDRs to mediate multiple protein-protein interactions. DBR-protein complexes were collected from the Protein Data Bank for which two or more DBRs having different amino acid sequences bind to the same (100% sequence identical) globular protein partner, a type of interaction herein called many-to-one binding. Two distinct binding profiles were identified: independent and overlapping. For the overlapping binding profiles, the distinct DBRs interact by means of almost identical binding sites (herein called “similar”), or the binding sites contain both common and divergent interaction residues (herein called “intersecting”). Further analysis of the sequence and structural differences among these three groups indicate how IDP flexibility allows different segments to adjust to similar, intersecting, and independent binding pockets

    DescribePROT: database of amino acid-level protein structure and function predictions

    Get PDF
    We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/
    corecore