2,119 research outputs found

    Structural Analysis and Stochastic Modelling Suggest a Mechanism for Calmodulin Trapping by CaMKII

    Get PDF
    Activation of CaMKII by calmodulin and the subsequent maintenance of constitutive activity through autophosphorylation at threonine residue 286 (Thr286) are thought to play a major role in synaptic plasticity. One of the effects of autophosphorylation at Thr286 is to increase the apparent affinity of CaMKII for calmodulin, a phenomenon known as “calmodulin trapping”. It has previously been suggested that two binding sites for calmodulin exist on CaMKII, with high and low affinities, respectively. We built structural models of calmodulin bound to both of these sites. Molecular dynamics simulation showed that while binding of calmodulin to the supposed low-affinity binding site on CaMKII is compatible with closing (and hence, inactivation) of the kinase, and could even favour it, binding to the high-affinity site is not. Stochastic simulations of a biochemical model showed that the existence of two such binding sites, one of them accessible only in the active, open conformation, would be sufficient to explain calmodulin trapping by CaMKII. We can explain the effect of CaMKII autophosphorylation at Thr286 on calmodulin trapping: It stabilises the active state and therefore makes the high-affinity binding site accessible. Crucially, a model with only one binding site where calmodulin binding and CaMKII inactivation are strictly mutually exclusive cannot reproduce calmodulin trapping. One of the predictions of our study is that calmodulin binding in itself is not sufficient for CaMKII activation, although high-affinity binding of calmodulin is

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    Probabilistic grammatical model of protein language and its application to helix-helix contact site classification

    Get PDF
    BACKGROUND: Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists

    XRate: a fast prototyping, training and annotation tool for phylo-grammars

    Get PDF
    BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools

    Introduction to protein folding for physicists

    Get PDF
    The prediction of the three-dimensional native structure of proteins from the knowledge of their amino acid sequence, known as the protein folding problem, is one of the most important yet unsolved issues of modern science. Since the conformational behaviour of flexible molecules is nothing more than a complex physical problem, increasingly more physicists are moving into the study of protein systems, bringing with them powerful mathematical and computational tools, as well as the sharp intuition and deep images inherent to the physics discipline. This work attempts to facilitate the first steps of such a transition. In order to achieve this goal, we provide an exhaustive account of the reasons underlying the protein folding problem enormous relevance and summarize the present-day status of the methods aimed to solving it. We also provide an introduction to the particular structure of these biological heteropolymers, and we physically define the problem stating the assumptions behind this (commonly implicit) definition. Finally, we review the 'special flavor' of statistical mechanics that is typically used to study the astronomically large phase spaces of macromolecules. Throughout the whole work, much material that is found scattered in the literature has been put together here to improve comprehension and to serve as a handy reference.Comment: 53 pages, 18 figures, the figures are at a low resolution due to arXiv restrictions, for high-res figures, go to http://www.pabloechenique.co

    A Balanced Secondary Structure Predictor

    Get PDF
    Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets

    A Balanced Secondary Structure Predictor

    Get PDF
    Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets
    • 

    corecore