5 research outputs found

    Computational Analysis and Prediction of Intrinsic Disorder and Intrinsic Disorder Functions in Proteins

    Get PDF
    COMPUTATIONAL ANALYSIS AND PREDICTION OF INTRINSIC DISORDER AND INTRINSIC DISORDER FUNCTIONS IN PROTEINS By Akila Imesha Katuwawala A dissertation submitted in partial fulfillment of the requirements for the degree of Engineering, Doctor of Philosophy with a concentration in Computer Science at Virginia Commonwealth University. Virginia Commonwealth University, 2021 Director: Lukasz Kurgan, Professor, Department of Computer Science Proteins, as a fundamental class of biomolecules, have been studied from various perspectives over the past two centuries. The traditional notion is that proteins require fixed and stable three-dimensional structures to carry out biological functions. However, there is mounting evidence regarding a “special” class of proteins, named intrinsically disordered proteins, which do not have fixed three-dimensional structures though they perform a number of important biological functions. Computational approaches have been a vital component to study these intrinsically disordered proteins over the past few decades. Prediction of the intrinsic disorder and functions of intrinsic disorder from protein sequences is one such important computational approach that has recently gained attention, particularly in the advent of the development of modern machine learning techniques. This dissertation runs along two basic themes, namely, prediction of the intrinsic disorder and prediction of the intrinsic disorder functions. The work related to the prediction of intrinsic disorder covers a novel approach to evaluate the predictive performance of the current computational disorder predictors. This approach evaluates the intrinsic disorder predictors at the individual protein level compared to the traditional studies that evaluate them over large protein datasets. We address several interesting aspects concerning the differences in the protein-level vs. dataset-level predictive quality, complementarity and predictive performance of the current predictors. Based on the findings from this assessment we have conceptualized, developed, tested and deployed an innovative platform called DISOselect that recommends the most suitable computational disorder predictors for a given protein, with an underlying goal to maximize the predictive performance. DISOselect provides advice on whether a given disorder predictor would provide an accurate prediction for a given protein of user’s interest, and recommends the most suitable disorder predictor together with an estimate of its expected predictive quality. The second theme, prediction of the intrinsic disorder functions, includes first-of-its-kind evaluation of the current computational disorder predictors on two functional sub-classes of the intrinsically disordered proteins. This study introduces several novel evaluation strategies to assess predictive performance of disorder prediction methods and focuses on the evaluation for disorder functions associated with interactions with partner molecules. Results of this analysis motivated us to conceptualize, design, test and deploy a new and accurate machine learning-based predictor of the disordered lipid-binding residues, DisoLipPred. We empirically show that the strong predictive performance of DisoLipPred stems from several innovative design features and that its predictions complements results produced by current disorder predictors, disorder function predictors and predictors of transmembrane regions. We deploy DisoLipPred as a convenient webserver and discuss its predictions on the yeast proteome

    DescribePROT: database of amino acid-level protein structure and function predictions

    Get PDF
    We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/

    IDPology of the Living Cell: Intrinsic Disorder in the Subcellular Compartments of the Human Cell

    No full text
    Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell

    Intrinsic Disorder in Human RNA-Binding Proteins

    No full text
    Although RNA-binding proteins (RBPs) are known to be enriched in intrinsic disorder, no previous analysis focused on RBPs interacting with specific RNA types. We fill this gap with a comprehensive analysis of the putative disorder in RBPs binding to six common RNA types: messenger RNA (mRNA), transfer RNA (tRNA), small nuclear RNA (snRNA), non-coding RNA (ncRNA), ribosomal RNA (rRNA), and internal ribosome RNA (irRNA). We also analyze the amount of putative intrinsic disorder in the RNA-binding domains (RBDs) and non-RNA-binding-domain regions (non-RBD regions). Consistent with previous studies, we show that in comparison with human proteome, RBPs are significantly enriched in disorder. However, closer examination finds significant enrichment in predicted disorder for the mRNA-, rRNA- and snRNA-binding proteins, while the proteins that interact with ncRNA and irRNA are not enriched in disorder, and the tRNA-binding proteins are significantly depleted in disorder. We show a consistent pattern of significant disorder enrichment in the non-RBD regions coupled with low levels of disorder in RBDs, which suggests that disorder is relatively rarely utilized in the RNA-binding regions. Our analysis of the non-RBD regions suggests that disorder harbors posttranslational modification sites and is involved in the putative interactions with DNA. Importantly, we utilize experimental data from DisProt and independent data from Pfam to validate the above observations that rely on the disorder predictions. This study provides new insights into the distribution of disorder across proteins that bind different RNA types and the functional role of disorder in the regions where it is enriched

    CAID prediction portal: a comprehensive service for predicting intrinsic disorder and binding regions in proteins

    No full text
    International audienceIntrinsic disorder (ID) in proteins is well-established in structural biology, with increasing evidence for its involvement in essential biological processes. As measuring d ynamic ID beha vior e xperimentall y on a large scale remains difficult, scores of published ID predictor s ha ve tried to fill this gap. Unfortunatel y, their heterogeneity makes it difficult to compare perf ormance, conf ounding biologists wanting to make an informed choice. To address this issue, the Critical Assessment of protein Intrinsic Disorder (CAID) benchmarks predictors for ID and binding regions as a community blind-test in a standardized computing environment. Here we present the CAID Prediction Portal, a web server executing all CAID methods on user-defined sequences. The server generates standardized output and facilitates comparison between methods, producing a consensus prediction highlighting high-confidence ID regions. The website contains extensive documentation explaining the meaning of different CAID statistics and providing a brief description of all methods. Predictor output is visualized in an interactive feature viewer and made available for download in a single table, with the option to recover previous sessions via a priv ate dashboar d. The CAID Prediction Portal is a valuable resource for researchers interested in studying ID in proteins. The server is available at the URL: https://caid.idpcentral.org
    corecore