657 research outputs found
Optimization and evaluation of a coarse-grained model of protein motion using X-ray crystal data
Simple coarse-grained models, such as the Gaussian Network Model, have been
shown to capture some of the features of equilibrium protein dynamics. We
extend this model by using atomic contacts to define residue interactions and
introducing more than one interaction parameter between residues. We use
B-factors from 98 ultra-high resolution X-ray crystal structures to optimize
the interaction parameters. The average correlation between GNM fluctuation
predictions and the B-factors is 0.64 for the data set, consistent with a
previous large-scale study. By separating residue interactions into covalent
and noncovalent, we achieve an average correlation of 0.74, and addition of
ligands and cofactors further improves the correlation to 0.75. However,
further separating the noncovalent interactions into nonpolar, polar, and mixed
yields no significant improvement. The addition of simple chemical information
results in better prediction quality without increasing the size of the
coarse-grained model.Comment: 18 pages, 4 figures, 1 supplemental file (cnm_si.tex
Predicting biomolecular function from 3D dynamics : sequence-sensitive coarse-grained elastic network model coupled to machine learning
La dynamique structurelle des biomolécules est intimement liée à leur fonction, mais très coûteuse à
étudier expériementalement. Pour cette raison, de nombreuses méthodologies computationnelles ont été
développées afin de simuler la dynamique structurelle biomoléculaire. Toutefois, lorsque l'on
s'intéresse à la modélisation des effects de milliers de mutations, les méthodes de simulations
classiques comme la dynamique moléculaire, que ce soit à l'échelle atomique ou gros-grain, sont trop
coûteuses pour la majorité des applications. D'autre part, les méthodes d'analyse de modes normaux
de modèles de réseaux élastiques gros-grain (ENM pour "elastic network model") sont très rapides et
procurent des solutions analytiques comprenant toutes les échelles de temps. Par contre, la majorité
des ENMs considèrent seulement la géométrie du squelette biomoléculaire, ce qui en fait de mauvais
choix pour étudier les effets de mutations qui ne changeraient pas cette géométrie. Le "Elastic
Network Contact Model" (ENCoM) est le premier ENM sensible à la séquence de la biomolécule à
l'étude, ce qui rend possible son utilisation pour l'exploration efficace d'espaces conformationnels
complets de variants de séquence. La présente thèse introduit le pipeline computationel
ENCoM-DynaSig-ML, qui réduit les espaces conformationnels prédits par ENCoM à des Signatures
Dynamiques qui sont ensuite utilisées pour entraîner des modèles d'apprentissage machine simples.
ENCoM-DynaSig-ML est capable de prédire la fonction de variants de séquence avec une précision
significative, est complémentaire à toutes les méthodes existantes, et peut générer de nouvelles
hypothèses à propos des éléments importants de dynamique structurelle pour une fonction moléculaire
donnée. Nous présentons trois exemples d'étude de relations séquence-dynamique-fonction: la
maturation des microARN, le potentiel d'activation de ligands du récepteur mu-opioïde et
l'efficacité enzymatique de l'enzyme VIM-2 lactamase. Cette application novatrice de l'analyse des
modes normaux est rapide, demandant seulement quelques secondes de temps de calcul par variant de
séquence, et est généralisable à toute biomolécule pour laquelle des données expérimentale de
mutagénèse sont disponibles.The dynamics of biomolecules are intimately tied to their functions but experimentally elusive,
making their computational study attractive. When modelling the effects of thousands of mutations,
time-stepping methods such as classical or enhanced sampling molecular dynamics are too costly for
most applications. On the other hand, normal mode analysis of coarse-grained elastic network models
(ENMs) provides fast analytical dynamics spanning all timescales. However, the vast majority of ENMs
consider backbone geometry alone, making them a poor choice to study point mutations which do not
affect the equilibrium structure. The Elastic Network Contact Model (ENCoM) is the first
sequence-sensitive ENM, enabling its use for the efficient exploration of full conformational spaces
from sequence variants. The present work introduces the ENCoM-DynaSig-ML computational pipeline, in
which the ENCoM conformational spaces are reduced to Dynamical Signatures and coupled to simple
machine learning algorithms. ENCoM-DynaSig-ML predicts the function of sequence variants with
significant accuracy, is complementary to all existing methods, and can generate new hypotheses
about which dynamical features are important for the studied biomolecule's function. Examples given
are the maturation efficiency of microRNA variants, the activation potential of mu-opioid receptor
ligands and the effect of point mutations on VIM-2 lactamase's enzymatic efficiency. This novel
application of normal mode analysis is very fast, taking a few seconds CPU time per variant, and is
generalizable to any biomolecule on which experimental mutagenesis data exist
Normal mode computations and applications
Proteins are fundamental functional units in cells. Proteins form stable and yet somewhat flexible 3D structures and often function by interacting with other molecules. Their functional behaviors are determined by their 3-D structures as well as their flexibilities. In this thesis, I focus my study on protein dynamics and its role in protein function.
One of the most powerful computational methods for studying protein dynamics is normal mode analysis (NMA). Especially its low frequency modes having the intrinsic dynamics of proteins are of interest for most of protein dynamics studies. Although NMA provides analytical solutions to a protein\u27s collective motions, it is inconvenient to use because of its requirement of energy minimization, and it is prohibitive due to the large memory consumption and the long computation time especially when the system is too large. Additionally, it is unclear what meanings the frequencies of normal modes have, and if those meanings can be validated by comparison with the experimental results.
The majority of this thesis resolves the above issues. I have addressed following sequence of questions and developed several simplified NMAs as answers: (1) what is the role of inter residue forces; (2) how to remove the energy minimization requirement in NMA yet to keep most of accuracy; (3) how to efficiently build the coarse-grained model from the all-atomic model with keeping atomic accuracy. Additionally, using newly developed models and traditional NMA, I have examined the meaning of normal modes in all frequency range, and have found the connection with experimental results.
The last part of this thesis addresses, as an application of normal modes, how the normal modes can depict the sequence of breathing motion of myoglobin to find the transition pathway that dynamically opens ligand migration channels. The results have an excellent agreement with molecular dynamics simulation results and experimentally determined reaction rate constants
Insights from Coarse-Grained Gō Models for Protein Folding and Dynamics
Exploring the landscape of large scale conformational changes such as protein folding at atomistic detail poses a considerable computational challenge. Coarse-grained representations of the peptide chain have therefore been developed and over the last decade have proved extremely valuable. These include topology-based Gō models, which constitute a smooth and funnel-like approximation to the folding landscape. We review the many variations of the Gō model that have been employed to yield insight into folding mechanisms. Their success has been interpreted as a consequence of the dominant role of the native topology in folding. The role of local contact density in determining protein dynamics is also discussed and is used to explain the ability of Gō-like models to capture sequence effects in folding and elucidate conformational transitions
The NRGTEN Python package: an extensible toolkit for coarse-grained normal mode analysis of proteins, nucleic acids, small molecules and their complexes
Summary: Coarse-grained normal mode analysis (NMA) is a fast computational
technique to study the dynamics of biomolecules. Here we present the
Najmanovich Research Group Toolkit for Elastic Networks (NRGTEN). NRGTEN is a
Python toolkit that implements four different NMA models in addition to popular
and novel metrics to benchmark and measure properties from these models.
Furthermore, the toolkit is available as a public Python package and is easily
extensible for the development or implementation of additional NMA models. The
inclusion of the ENCoM model (Elastic Network Contact Model) developed in our
group within NRGTEN is noteworthy, owing to its account for the specific
chemical nature of atomic interactions. This makes possible some unique
predictions of the effect of mutations, such as on stability (via changes in
vibrational entropy differences), on the transition probability between
different conformational states or on the flexibility profile of the whole
macromolecule/complex (to study allostery and signalling). In addition, all NMA
models can be used to generate conformational ensembles from a starting
structure to aid in protein-protein, protein-ligand or other docking studies
among applications. NRGTEN is freely available via a public Python package
which can be easily installed on any modern machine and includes a detailed
user guide hosted online. Availability and implementation:
https://github.com/gregorpatof/nrgten_package/ Contact:
[email protected]
Accurate and efficient description of protein vibrational dynamics: comparing molecular dynamics and Gaussian models
Current all-atom potential based molecular dynamics (MD) allow the
identification of a protein's functional motions on a wide-range of
time-scales, up to few tens of ns. However, functional large scale motions of
proteins may occur on a time-scale currently not accessible by all-atom
potential based molecular dynamics. To avoid the massive computational effort
required by this approach several simplified schemes have been introduced. One
of the most satisfactory is the Gaussian Network approach based on the energy
expansion in terms of the deviation of the protein backbone from its native
configuration. Here we consider an extension of this model which captures in a
more realistic way the distribution of native interactions due to the
introduction of effective sidechain centroids. Since their location is entirely
determined by the protein backbone, the model is amenable to the same exact and
computationally efficient treatment as previous simpler models. The ability of
the model to describe the correlated motion of protein residues in
thermodynamic equilibrium is established through a series of successful
comparisons with an extensive (14 ns) MD simulation based on the AMBER
potential of HIV-1 protease in complex with a peptide substrate. Thus, the
model presented here emerges as a powerful tool to provide preliminary, fast
yet accurate characterizations of proteins near-native motion.Comment: 14 pages 7 figure
Sequence composition and environment effects on residue fluctuations in protein structures
The spectrum and scale of fluctuations in protein structures affect the range
of cell phenomena, including stability of protein structures or their
fragments, allosteric transitions and energy transfer. The study presents a
statistical-thermodynamic analysis of relationship between the sequence
composition and the distribution of residue fluctuations in protein-protein
complexes. A one-node-per residue elastic network model accounting for the
nonhomogeneous protein mass distribution and the inter-atomic interactions
through the renormalized inter-residue potential is developed. Two factors, a
protein mass distribution and a residue environment, were found to determine
the scale of residue fluctuations. Surface residues undergo larger fluctuations
than core residues, showing agreement with experimental observations. Ranking
residues over the normalized scale of fluctuations yields a distinct
classification of amino acids into three groups. The structural instability in
proteins possibly relates to the high content of the highly fluctuating
residues and a deficiency of the weakly fluctuating residues in irregular
secondary structure elements (loops), chameleon sequences and disordered
proteins. Strong correlation between residue fluctuations and the sequence
composition of protein loops supports this hypothesis. Comparing fluctuations
of binding site residues (interface residues) with other surface residues shows
that, on average, the interface is more rigid than the rest of the protein
surface and Gly, Ala, Ser, Cys, Leu and Trp have a propensity to form more
stable docking patches on the interface. The findings have broad implications
for understanding mechanisms of protein association and stability of protein
structures.Comment: 8 pages, 4 figure
- …