2,119 research outputs found
Structural Analysis and Stochastic Modelling Suggest a Mechanism for Calmodulin Trapping by CaMKII
Activation of CaMKII by calmodulin and the subsequent maintenance of constitutive activity through autophosphorylation at threonine residue 286 (Thr286) are thought to play a major role in synaptic plasticity. One of the effects of autophosphorylation at Thr286 is to increase the apparent affinity of CaMKII for calmodulin, a phenomenon known as âcalmodulin trappingâ. It has previously been suggested that two binding sites for calmodulin exist on CaMKII, with high and low affinities, respectively. We built structural models of calmodulin bound to both of these sites. Molecular dynamics simulation showed that while binding of calmodulin to the supposed low-affinity binding site on CaMKII is compatible with closing (and hence, inactivation) of the kinase, and could even favour it, binding to the high-affinity site is not. Stochastic simulations of a biochemical model showed that the existence of two such binding sites, one of them accessible only in the active, open conformation, would be sufficient to explain calmodulin trapping by CaMKII. We can explain the effect of CaMKII autophosphorylation at Thr286 on calmodulin trapping: It stabilises the active state and therefore makes the high-affinity binding site accessible. Crucially, a model with only one binding site where calmodulin binding and CaMKII inactivation are strictly mutually exclusive cannot reproduce calmodulin trapping. One of the predictions of our study is that calmodulin binding in itself is not sufficient for CaMKII activation, although high-affinity binding of calmodulin is
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
Probabilistic grammatical model of protein language and its application to helix-helix contact site classification
BACKGROUND: Hidden Markov Models power many stateâofâtheâart tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on mediumâ and longârange residueâresidue interactions. This requires an expressive power of at least contextâfree grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problemâspecific protein languages and apply it to classification of transmembrane helixâhelix pairs configurations. The core of the model consists of a probabilistic contextâfree grammar, automatically inferred by a genetic algorithm from only a generic set of expertâbased rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helixâhelix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helixâhelix contact sites. CONCLUSIONS: We demonstrated that our probabilistic contextâfree framework for analysis of protein sequences outperforms the state of the art in the task of helixâhelix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are humanâreadable. Thus they could provide biologically meaningful information for molecular biologists
XRate: a fast prototyping, training and annotation tool for phylo-grammars
BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools
Introduction to protein folding for physicists
The prediction of the three-dimensional native structure of proteins from the
knowledge of their amino acid sequence, known as the protein folding problem,
is one of the most important yet unsolved issues of modern science. Since the
conformational behaviour of flexible molecules is nothing more than a complex
physical problem, increasingly more physicists are moving into the study of
protein systems, bringing with them powerful mathematical and computational
tools, as well as the sharp intuition and deep images inherent to the physics
discipline. This work attempts to facilitate the first steps of such a
transition. In order to achieve this goal, we provide an exhaustive account of
the reasons underlying the protein folding problem enormous relevance and
summarize the present-day status of the methods aimed to solving it. We also
provide an introduction to the particular structure of these biological
heteropolymers, and we physically define the problem stating the assumptions
behind this (commonly implicit) definition. Finally, we review the 'special
flavor' of statistical mechanics that is typically used to study the
astronomically large phase spaces of macromolecules. Throughout the whole work,
much material that is found scattered in the literature has been put together
here to improve comprehension and to serve as a handy reference.Comment: 53 pages, 18 figures, the figures are at a low resolution due to
arXiv restrictions, for high-res figures, go to http://www.pabloechenique.co
A Balanced Secondary Structure Predictor
Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets
A Balanced Secondary Structure Predictor
Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets
- âŠ