345 research outputs found
Influence of Sequence Changes and Environment on Intrinsically Disordered Proteins
Many large-scale studies on intrinsically disordered proteins are implicitly based on the structural models deposited in the Protein Data Bank. Yet, the static nature of deposited models supplies little insight into variation of protein structure and function under diverse cellular and environmental conditions. While the computational predictability of disordered regions provides practical evidence that disorder is an intrinsic property of proteins, the robustness of disordered regions to changes in sequence or environmental conditions has not been systematically studied. We analyzed intrinsically disordered regions in the same or similar proteins crystallized independently and studied their sensitivity to changes in protein sequence and parameters of crystallographic experiments. The observed changes in the existence, position, and length of disordered regions indicate that their appearance in X-ray structures dramatically depends on changes in amino acid sequence and peculiarities of the crystallographic experiment. Our study also raises general questions regarding protein evolution and the regulation of protein structure, dynamics, and function via variations in cellular and environmental conditions
Predicting mostly disordered proteins by using structure-unknown protein data
BACKGROUND: Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. RESULTS: When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences. CONCLUSION: The proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness
The N-terminal intrinsically disordered domain of mgm101p is localized to the mitochondrial nucleoid.
The mitochondrial genome maintenance gene, MGM101, is essential for yeasts that depend on mitochondrial DNA replication. Previously, in Saccharomyces cerevisiae, it has been found that the carboxy-terminal two-thirds of Mgm101p has a functional core. Furthermore, there is a high level of amino acid sequence conservation in this region from widely diverse species. By contrast, the amino-terminal region, that is also essential for function, does not have recognizable conservation. Using a bioinformatic approach we find that the functional core from yeast and a corresponding region of Mgm101p from the coral Acropora millepora have an ordered structure, while the N-terminal domains of sequences from yeast and coral are predicted to be disordered. To examine whether ordered and disordered domains of Mgm101p have specific or general functions we made chimeric proteins from yeast and coral by swapping the two regions. We find, by an in vivo assay in S.cerevisiae, that the ordered domain of A.millepora can functionally replace the yeast core region but the disordered domain of the coral protein cannot substitute for its yeast counterpart. Mgm101p is found in the mitochondrial nucleoid along with enzymes and proteins involved in mtDNA replication. By attaching green fluorescent protein to the N-terminal disordered domain of yeast Mgm101p we find that GFP is still directed to the mitochondrial nucleoid where full-length Mgm101p-GFP is targeted
Differences in the Number of Intrinsically Disordered Regions between Yeast Duplicated Proteins, and Their Relationship with Functional Divergence
BACKGROUND: Intrinsically disordered regions are enriched in short interaction motifs that play a critical role in many protein-protein interactions. Since new short interaction motifs may easily evolve, they have the potential to rapidly change protein interactions and cellular signaling. In this work we examined the dynamics of gain and loss of intrinsically disordered regions in duplicated proteins to inspect if changes after genome duplication can create functional divergence. For this purpose we used Saccharomyces cerevisiae and the outgroup species Lachancea kluyveri. PRINCIPAL FINDINGS: We find that genes duplicated as part of a genome duplication (ohnologs) are significantly more intrinsically disordered than singletons (p<2.2(e)-16, Wilcoxon), reflecting a preference for retaining intrinsically disordered proteins in duplicate. In addition, there have been marked changes in the extent of intrinsic disorder following duplication. A large number of duplicated genes have more intrinsic disorder than their L. kluyveri ortholog (29% for duplicates versus 25% for singletons) and an even greater number have less intrinsic disorder than the L. kluyveri ortholog (37% for duplicates versus 25% for singletons). Finally, we show that the number of physical interactions is significantly greater in the more intrinsically disordered ohnolog of a pair (p = 0.003, Wilcoxon). CONCLUSION: This work shows that intrinsic disorder gain and loss in a protein is a mechanism by which a genome can also diverge and innovate. The higher number of interactors for proteins that have gained intrinsic disorder compared with their duplicates may reflect the acquisition of new interaction partners or new functional roles
Native aggregation as a cause of origin of temporary cellular structures needed for all forms of cellular activity, signaling and transformations
According to the hypothesis explored in this paper, native aggregation is genetically controlled (programmed) reversible aggregation that occurs when interacting proteins form new temporary structures through highly specific interactions. It is assumed that Anfinsen's dogma may be extended to protein aggregation: composition and amino acid sequence determine not only the secondary and tertiary structure of single protein, but also the structure of protein aggregates (associates). Cell function is considered as a transition between two states (two states model), the resting state and state of activity (this applies to the cell as a whole and to its individual structures). In the resting state, the key proteins are found in the following inactive forms: natively unfolded and globular. When the cell is activated, secondary structures appear in natively unfolded proteins (including unfolded regions in other proteins), and globular proteins begin to melt and their secondary structures become available for interaction with the secondary structures of other proteins. These temporary secondary structures provide a means for highly specific interactions between proteins. As a result, native aggregation creates temporary structures necessary for cell activity
Free Cysteine Modulates the Conformation of Human C/EBP Homologous Protein
The C/EBP Homologous Protein (CHOP) is a nuclear protein that is integral to the unfolded protein response culminating from endoplasmic reticulum stress. Previously, CHOP was shown to comprise extensive disordered regions and to self-associate in solution. In the current study, the intrinsically disordered nature of this protein was characterized further by comprehensive in silico analyses. Using circular dichroism, differential scanning calorimetry and nuclear magnetic resonance, we investigated the global conformation and secondary structure of CHOP and demonstrated, for the first time, that conformational changes in this protein can be induced by the free amino acid l-cysteine. Addition of l-cysteine caused a significant dose-dependent decrease in the protein helicity – dropping from 69.1% to 23.8% in the presence of 1 mM of l-cysteine – and a sequential transition to a more disordered state, unlike that caused by thermal denaturation. Furthermore, the presence of small amounts of free amino acid (80 µM, an 8∶1 cysteine∶CHOP ratio) during CHOP thermal denaturation altered the molecular mechanism of its melting process, leading to a complex, multi-step transition. On the other hand, high levels (4 mM) of free l-cysteine seemed to cause a complete loss of rigid cooperatively melting structure. These results suggested a potential regulatory function of l-cysteine which may lead to changes in global conformation of CHOP in response to the cellular redox state and/or endoplasmic reticulum stress
Bayesian statistical modelling of human protein interaction network incorporating protein disorder information
<p>Abstract</p> <p>Background</p> <p>We present a statistical method of analysis of biological networks based on the exponential random graph model, namely p2-model, as opposed to previous descriptive approaches. The model is capable to capture generic and structural properties of a network as emergent from local interdependencies and uses a limited number of parameters. Here, we consider one global parameter capturing the density of edges in the network, and local parameters representing each node's contribution to the formation of edges in the network. The modelling suggests a novel definition of important nodes in the network, namely <it>social</it>, as revealed based on the local <it>sociality </it>parameters of the model. Moreover, the sociality parameters help to reveal organizational principles of the network. An inherent advantage of our approach is the possibility of hypotheses testing: <it>a priori </it>knowledge about biological properties of the nodes can be incorporated into the statistical model to investigate its influence on the structure of the network.</p> <p>Results</p> <p>We applied the statistical modelling to the human protein interaction network obtained with Y2H experiments. Bayesian approach for the estimation of the parameters was employed. We deduced <it>social </it>proteins, essential for the formation of the network, while incorporating into the model information on protein disorder. <it>Intrinsically disordered </it>are proteins which lack a well-defined three-dimensional structure under physiological conditions. We predicted the fold group (ordered or disordered) of proteins in the network from their primary sequences. The network analysis indicated that protein disorder has a positive effect on the connectivity of proteins in the network, but do not fully explains the interactivity.</p> <p>Conclusions</p> <p>The approach opens a perspective to study effects of biological properties of individual entities on the structure of biological networks.</p
Modeling Disordered Regions in Proteins Using Rosetta
Protein structure prediction methods such as Rosetta search for the lowest energy conformation of the polypeptide chain. However, the experimentally observed native state is at a minimum of the free energy, rather than the energy. The neglect of the missing configurational entropy contribution to the free energy can be partially justified by the assumption that the entropies of alternative folded states, while very much less than unfolded states, are not too different from one another, and hence can be to a first approximation neglected when searching for the lowest free energy state. The shortcomings of current structure prediction methods may be due in part to the breakdown of this assumption. Particularly problematic are proteins with significant disordered regions which do not populate single low energy conformations even in the native state. We describe two approaches within the Rosetta structure modeling methodology for treating such regions. The first does not require advance knowledge of the regions likely to be disordered; instead these are identified by minimizing a simple free energy function used previously to model protein folding landscapes and transition states. In this model, residues can be either completely ordered or completely disordered; they are considered disordered if the gain in entropy outweighs the loss of favorable energetic interactions with the rest of the protein chain. The second approach requires identification in advance of the disordered regions either from sequence alone using for example the DISOPRED server or from experimental data such as NMR chemical shifts. During Rosetta structure prediction calculations the disordered regions make only unfavorable repulsive contributions to the total energy. We find that the second approach has greater practical utility and illustrate this with examples from de novo structure prediction, NMR structure calculation, and comparative modeling
- …