246 research outputs found

    Wiggle—Predicting Functionally Flexible Regions from Primary Sequence

    Get PDF
    The Wiggle series are support vector machine–based predictors that identify regions of functional flexibility using only protein sequence information. Functionally flexible regions are defined as regions that can adopt different conformational states and are assumed to be necessary for bioactivity. Many advances have been made in understanding the relationship between protein sequence and structure. This work contributes to those efforts by making strides to understand the relationship between protein sequence and flexibility. A coarse-grained protein dynamic modeling approach was used to generate the dataset required for support vector machine training. We define our regions of interest based on the participation of residues in correlated large-scale fluctuations. Even with this structure-based approach to computationally define regions of functional flexibility, predictors successfully extract sequence-flexibility relationships that have been experimentally confirmed to be functionally important. Thus, a sequence-based tool to identify flexible regions important for protein function has been created. The ability to identify functional flexibility using a sequence based approach complements structure-based definitions and will be especially useful for the large majority of proteins with unknown structures. The methodology offers promise to identify structural genomics targets amenable to crystallization and the possibility to engineer more flexible or rigid regions within proteins to modify their bioactivity

    Prediction of protein motions from amino acid sequence and its application to protein-protein interaction

    Get PDF
    BACKGROUND: Structural flexibility is an important characteristic of proteins because it is often associated with their function. The movement of a polypeptide segment in a protein can be broken down into two types of motions: internal and external ones. The former is deformation of the segment itself, but the latter involves only rotational and translational motions as a rigid body. Normal Model Analysis (NMA) can derive these two motions, but its application remains limited because it necessitates the gathering of complete structural information. RESULTS: In this work, we present a novel method for predicting two kinds of protein motions in ordered structures. The prediction uses only information from the amino acid sequence. We prepared a dataset of the internal and external motions of segments in many proteins by application of NMA. Subsequently, we analyzed the relation between thermal motion assessed from X-ray crystallographic B-factor and internal/external motions calculated by NMA. Results show that attributes of amino acids related to the internal motion have different features from those related to the B-factors, although those related to the external motion are correlated strongly with the B-factors. Next, we developed a method to predict internal and external motions from amino acid sequences based on the Random Forest algorithm. The proposed method uses information associated with adjacent amino acid residues and secondary structures predicted from the amino acid sequence. The proposed method exhibited moderate correlation between predicted internal and external motions with those calculated by NMA. It has the highest prediction accuracy compared to a naïve model and three published predictors. CONCLUSIONS: Finally, we applied the proposed method predicting the internal motion to a set of 20 proteins that undergo large conformational change upon protein-protein interaction. Results show significant overlaps between the predicted high internal motion regions and the observed conformational change regions

    Natively Unstructured Loops Differ from Other Loops

    Get PDF
    Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks

    Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    Get PDF
    BACKGROUND: Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D) protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP); the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. RESULTS: The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM) and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are characterized by accuracies below 70%. Finally, the Naïve Bayes method is shown to provide the highest sensitivity for the prediction of flexible regions, while FlexRP and SVM give the highest sensitivity for rigid regions. CONCLUSION: A new sequence representation that uses k-spaced amino acid pairs is shown to be the most efficient in the prediction of the flexible/rigid regions of protein sequences. The proposed FlexRP method provides the highest prediction accuracy of about 80%. The experimental tests show that the FlexRP and SVM methods achieved high overall accuracy and the highest sensitivity for rigid regions, while the best quality of the predictions for flexible regions is achieved by the Naïve Bayes method

    Identifying allosteric fluctuation transitions between different protein conformational states as applied to Cyclin Dependent Kinase 2

    Get PDF
    BACKGROUND: The mechanisms underlying protein function and associated conformational change are dominated by a series of local entropy fluctuations affecting the global structure yet are mediated by only a few key residues. Transitional Dynamic Analysis (TDA) is a new method to detect these changes in local protein flexibility between different conformations arising from, for example, ligand binding. Additionally, Positional Impact Vertex for Entropy Transfer (PIVET) uses TDA to identify important residue contact changes that have a large impact on global fluctuation. We demonstrate the utility of these methods for Cyclin-dependent kinase 2 (CDK2), a system with crystal structures of this protein in multiple functionally relevant conformations and experimental data revealing the importance of local fluctuation changes for protein function. RESULTS: TDA and PIVET successfully identified select residues that are responsible for conformation specific regional fluctuation in the activation cycle of Cyclin Dependent Kinase 2 (CDK2). The detected local changes in protein flexibility have been experimentally confirmed to be essential for the regulation and function of the kinase. The methodologies also highlighted possible errors in previous molecular dynamic simulations that need to be resolved in order to understand this key player in cell cycle regulation. Finally, the use of entropy compensation as a possible allosteric mechanism for protein function is reported for CDK2. CONCLUSION: The methodologies embodied in TDA and PIVET provide a quick approach to identify local fluctuation change important for protein function and residue contacts that contributes to these changes. Further, these approaches can be used to check for possible errors in protein dynamic simulations and have the potential to facilitate a better understanding of the contribution of entropy to protein allostery and function

    H4K16 acetylation during embryonic stem cell differentiation

    Get PDF
    Eukaryote DNA is organised into the more compact nucleosome by wrapping 147bp of DNA around a histone octamer core. The N-terminal tails of the histones protrude through the DNA and can be modified by a variety of enzymes. Acetylation of Histone 4 Lysine 16 (H4K16ac) is an important modification associated with an increase in transcription, and in flies is an important component of the doseage compensation system. It is also unique amongst histone modifications in that it has been directly associated with chromatin decompaction. H4K16ac has been linked to development through its Histone Acetyltransferase, MOF. Deletion of MOF in mice leads to mass chromatin defects, and embryonic lethality prior to the blastocyst stage. I set out to understand the role of H4K16ac in differentiating Embryonic Stem cells (ES cells) and chromatin compaction in vivo. I generated a ChIP-seq profile for H4K16ac in undifferentiated ES cells, and after 3 days of retinoic acid (RA) differentiation. This revealed an association of H4K16ac with the promoters of transcribed genes in pluripotent ES cells, followed by loss H4K16ac on ES cell specific genes and gain of the modification on differentiation specific genes. There were some silent genes in ES cells, however, which were acetylated on their promoters. Through this study I also found that H4K16ac and MOF mark active enhancers in ES cells, along with H3K4me1 and H3K27Ac and p300. H4K16ac did not mark a known regulatory region in limb cells, and it is possible that it marks active enhancers only of ES cells. Furthermore, I looked at the compaction state large regions (>100kb) which lost H4K16ac upon differentiation by FISH, to determine if loss of H4K16ac could predict compaction. The regions selected showed no change in compaction state between UD and D3 cells, meaning that loss of H4K16ac does not directly lead to chromatin compaction in vivo. However loss of H4K16ac may be necessary for any subsequent compaction, or the change in compaction may take place at nucleosomal level. Finally, I attempted both to overexpress and reduce the level of MOF in ES cells. I was unable to manipulate the level of MOF in this cell type in either direction; expression of endogenous MOF was silenced after very little time, and stable MOF shRNA cell lines showed no reduction in levels of MOF. Therefore, potentially, dosage of MOF/H4K16ac in this cell type is critical. This study may help to understand the significance of H4K16ac in ES cell differentiation and chromatin compaction

    Epigenetic Regulation of Lymphocyte Development and Transformation

    Get PDF
    Cell identity and function rely on intricately controlled programs of gene regulation, alterations of which underlie many diseases, including cancer. Epigenetic analyses of normal and diseased cells have started to elucidate different facets of epigenetic mechanisms for gene regulation. These include changes in nucleosome density, histone modifications, factor binding and chromosomal architecture. All of these aspects contribute to the activities of regulatory elements conferring promoter, enhancer and insulator functions and the cis-regulatory circuits formed by these elements. Despite this progress, an urgent need remains to profile these features and to study how they cooperatively function in normal and pathogenic settings. Here, using the mouse T cell receptor beta locus as a model, we first quantified 13 distinct features, including transcription, chromatin environment, spatial proximity, and predicted qualities of recombination signal sequences (RSS), to assess their relative contributions in shaping recombination frequencies of Vβ gene segments. We found that the most predictive parameters are chromatin modifications associated with transcription, but recombination efficiencies are largely independent of spatial proximity. These findings enabled us to build a novel computational model predicting Vβ usage that uses a minimum set of five features. Expanding on these results, we applied chromatin profiling and computational algorithms to other mouse antigen receptor loci, to classify and identify novel regulatory elements. We defined 38 chromatin states that reflect distinct regulatory potentials. One of these states corresponded to known enhancers and also identified new enhancer candidates in immunoglobulin loci. Indeed, all four candidate elements exhibited enhancer activity in B cells when subjected to functional assays, validating that our chromatin profiling and computational analyses successfully identified enhancers in antigen receptor loci. Finally, we translated these approaches to human B cell lymphoma to predict pathogenic cis-regulatory circuits composed of dysregulated enhancers and target genes. We then selected and functionally dissected a pathogenic cis-regulatory circuit for the mitosis-associated kinase, NEK6, which is overexpressed in human B cell lymphoma. We found that only a subset of predicted enhancers is required to maintain elevated NEK6 expression in transformed B cells. Surprisingly, a B cell-specific super-enhancer is completely dispensable to maintain NEK6 expression and chromatin architecture within its chromosomal neighborhood. Moreover, we showed that a cluster of binding sites for the CTCF architectural factor serves as a chromatin boundary, blocking the functional impact of a NEK6 regulatory hub on neighboring genes. These results emphasize the necessity to test predicted cis-regulatory circuits, especially the roles of enhancers and super-enhancers, when prioritizing elements as targets for epigenetic-based therapies. Our findings collectively pave the way for future investigations into the roles of cis-regulatory and architectural elements in regulating gene expression programs during normal development or pathogenesis

    A study of intrinsic disorder and its role in functional proteomics

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics, 2009The last decade has witnessed the emergence of an alternate view on how protein function arises. This view attributes the functionality of many proteins to the presence of an ensemble of flexible regions popularly as `intrinsically disordered' or `unstructured'. Several proteomic studies have corroborated the existence of either wholly disordered proteins or proteins that contain regions of disorder in them. The purpose of this dissertation was to investigate the consistency of such regions across experiments, their mechanism of facilitating function via disorder-to-order transitions, their presence and significance in pathogenic versus non-pathogenic organisms and their promise of applicability towards the computational prediction of peptides involved in the most common class of post-translational modifications, phosphorylation. Besides these, a new algorithm exploiting the strong correlation between phosphorylation and intrinsic disorder has also been proposed to improve the detection of phosphorylated peptides via high-throughput methods such as tandem mass-spectrometry (LC-MS/MS). Results presented in this study, guide us in understanding the robustness of unstructured regions in proteins to sequence changes and environment, their role in facilitating molecular recognition as well as improving currently available methods for identification of post-translationally modified peptides. The findings and conclusions of this dissertation have the potential to impact ongoing structural genomics initiatives by suggesting alternative methods for determining structure for targets containing regions of disorder. Additional ramifications of results from this work include directing attention towards the possible use of regions of intrinsic disorder by pathogenic organisms for host cell invasion. We believe that unlike the traditional reductionist approach in a scientific method, this study gathers strength and utility by investigating the role of intrinsic disorder on more than one front in order to provide a novel perspective to the understanding of complex interactions within biological systems. Concluding arguments presented in this study pique one's curiosity regarding the evolution of disordered regions and proteins in general. On a technological side, the findings from this study unequivocally support the viable use of informatics methods in gaining new insights about a relatively young class of proteins known as intrinsically disordered proteins and its applicability to improve our present knowledge of cellular physiology

    Design Principles Of Mammalian Transcriptional Regulation

    Get PDF
    Transcriptional regulation occurs via changes to different biochemical steps of transcription, but it remains unclear which steps are subject to change upon biological perturbation. Single cell studies have revealed that transcription occurs in discontinuous bursts, suggesting that features of such bursts like burst fraction (what fraction of time a gene spends transcribing RNA) and burst intensity could be points of transcriptional regulation. Both how such features might be regulated and the prevalence of such modes of regulation are unclear. I first used a synthetic transcription factor to increase enhancer-promoter contact at the β -globin locus. Increasing promoter- enhancer contact specifically modulated the burst fraction of β -globin in both immortalized mouse and primary human erythroid cells. This finding raised the question of how generally important the phenomenon of burst fraction regulation might be, compared to other modes of regulation. For example, biochemical studies have suggested that stimuli predominantly affect the rate of RNA polymerase II (Pol II) binding and the rate of Pol II release from promoter-proximal pausing, but the prevalence of these modes of regulation compared to changes in bursting had not been examined. I combined Pol II ChIP-seq and single cell transcriptional measurements to reveal that an independently regulated burst initiation step is required before polymerase binding can occur, and that the change in burst fraction produced by increased enhancer-promoter contact was caused by an increased burst initiation rate. Using a number of global and targeted transcriptional regulatory perturbations, I showed that biological perturbations regulated both burst initiation and polymerase pause release rates, but seemed not to regulate polymerase binding rate. Our results suggest that transcriptional regulation primarily acts by changing the rates of burst initiation and polymerase pause release
    corecore