10 research outputs found

    CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs

    Get PDF
    CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/

    Disordered Patterns in Clustered Protein Data Bank and in Eukaryotic and Bacterial Proteomes

    Get PDF
    We have constructed the clustered Protein Data Bank and obtained clusters of chains of different identity inside each cluster, http://bioinfo.protres.ru/st_pdb/. We have compiled the largest database of disordered patterns (141) from the clustered PDB where identity between chains inside of a cluster is larger or equal to 75% (version of 28 June 2010) by using simple rules of selection. The results of these analyses would help to further our understanding of the physicochemical and structural determinants of intrinsically disordered regions that serve as molecular recognition elements. We have analyzed the occurrence of the selected patterns in 97 eukaryotic and in 26 bacterial proteomes. The disordered patterns appear more often in eukaryotic than in bacterial proteomes. The matrix of correlation coefficients between numbers of proteins where a disordered pattern from the library of 141 disordered patterns appears at least once in 9 kingdoms of eukaryota and 5 phyla of bacteria have been calculated. As a rule, the correlation coefficients are higher inside of the considered kingdom than between them. The patterns with the frequent occurrence in proteomes have low complexity (PPPPP, GGGGG, EEEED, HHHH, KKKKK, SSTSS, QQQQQP), and the type of patterns vary across different proteomes, http://bioinfo.protres.ru/fp/search_new_pattern.html

    Recognition of the CCT5 di-glu degron by CRL4(DCAF12) is incompatible with TRiC assembly

    Get PDF
    The molecular processes within all living organisms rely on the correct functioning of proteins, most of which must assemble into multimeric complexes of defined architecture and composition. In eukaryotes, assembly quality control (AQC) E3 ubiquitin ligases target incomplete or incorrectly assembled protein complexes for degradation to ensure protein complex functionality and proteostasis. The CUL4-RBX1-DDB1-DCAF12 (CRL4(DCAF12)) E3 ubiquitin ligase induces the proteasomal degradation of proteins with a C-terminal double glutamate (di-Glu) motif. Putative CRL4(DCAF12) substrates include CCT5, a subunit of the eukaryotic TRiC chaperonin. TRiC is responsible for the folding of around 10% of the human proteome. Its functionality relies on the correct arrangement of its eight subunits, but how TRiC assembly is ensured has not yet been investigated. Furthermore, how DCAF12 recognizes its substrates is unknown. Here the cryo-EM structure of the CCT5-bound DDB1-DCAF12 complex at 2.8 Å resolution is presented. DCAF12 serves as a canonical WD40 DCAF substrate receptor and uses a positively charged pocket at the center of its β-propeller to bind the C-terminus of CCT5. DCAF12 specifically reads out the CCT5 di-Glu side chains, and contacts other visible degron amino acids through weaker Van der Waals interactions, explaining the flexibility in substrate recognition. The CCT5 C-terminus is inaccessible in an assembled TRiC complex, and functional assays demonstrate that CRL4(DCAF12) binds and ubiquitinates monomeric CCT5, but not TRiC. The presented results suggest a previously unanticipated AQC role for the CRL4(DCAF12) E3 ligase towards TRiC, and likely other complexes

    Mechanistic studies of GID/CTLH E3 ubiquitin ligases

    Get PDF
    The integrative structural, biochemical, and cell biology data revealed mechanistic principles of substrate targeting by evolutionarily conserved GID/CTLH E3 ubiquitin ligases. The pliable substrate receptors adapt their conformations to recognize various N-terminal peptide/degron sequences, whereas the chelator-like supramolecular assembly of a catalytically active E3 ligase core configures multi-pronged targeting of oligomeric substrates. The revealed principles provide a conceptual framework for understanding the emerging complexity of the GID E3 ligase family

    Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

    Get PDF
    Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

    The dynamics of post-translational modifications and protein-protein interactions in essential signaling networks for stress response and survival

    Get PDF
    Organism stress causes perturbations to cellular homeostasis, which is required for proper growth and survival. Defense against stress depends on signaling networks regulated by post-translational modifications (PTMs) and protein-protein interactions (PPIs) that induce changes in protein expression and activity. Proteolysis is an irreversible PTM that helps maintain proteostasis, but can also result in the release of bioactive signaling peptides. Reversible PTMs, such as protein phosphorylation, modulate protein function and serve as a molecular switch to relay rapid signals through the cell. Physical interactions between proteins can dictate modifications and alter protein structure. Structural changes influence subsequent interactions with adjacent proteins and influence cellular signaling. The aim of this dissertation is to highlight method development and research characterizing PTMs and PPIs that are essential to stress signaling.First, workflows for quantitative peptidomics and in vitro enzyme assays were optimized to determine potential thimet oligopeptidase substrates and confirm direct peptide substrates in the model plant species Arabidopsis thaliana (Chapter 2). Then, the early stages of the immune response in locally infected tissue of Arabidopsis were characterized to understand thimet oligopeptidase-mediated proteostasis (Chapter 3). Modulation in protein phosphorylation following antimicrobial treatment in the bacteria Enterococcus faecalis was investigated using quantitative phosphoproteomics (Chapter 4). From this, phosphorylation dependence on a kinase known to be involved with E. faecalis antimicrobial resistance was revealed. Chapter 5 covers standard operating procedures for LC-MS/MS methods that can be employed to discover novel PPIs are discussed, including crosslinking- and immunoprecipitation-mass spectrometry. Then, the global protein interactome and novel protein structures were elucidated in the model alga species Chlamydomonas reinhardtii via crosslinking-mass spectrometry (Chapter 6).Together, the work described in this dissertation highlights the utility of LC-MS/MS-based methods to analyze PTMs and PPIs in a discovery-based, high throughput manner. The insights gained from these experiments can be used to guide mechanistic studies that provide a better understanding into how changes in protein structure, function, or activity affect stress response signaling networks.Doctor of Philosoph

    Library of disordered patterns in 3D protein structures

    Get PDF
    Intrinsically disordered regions serve as molecular recognition elements, which play an important role in the control of many cellular processes and signaling pathways. It is useful to be able to predict positions of disordered regions in protein chains. The statistical analysis of disordered residues was done considering 34,464 unique protein chains taken from the PDB database. In this database, 4.95 % of residues are disordered (i.e. invisible in X-ray structures). The statistics were obtained separately for the N- and C-termini as well as for the central part of the protein chain. It has been shown that frequencies of occurrence of disordered residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain. Our systematic analysis of disordered regions in PDB revealed 109 disordered patterns of different lengths. Each of them has disordered occurrences in at least five protein chains with identity less than 20%. The vast majority of all occurrences of each disordered pattern are disordered. This allows one to use the library of disordered patterns for predicting the status of a residue of a given protein to be ordered or disordered. We analyzed the occurrence o
    corecore