28,801 research outputs found
Transferable neural networks for enhanced sampling of protein dynamics
Variational auto-encoder frameworks have demonstrated success in reducing
complex nonlinear dynamics in molecular simulation to a single non-linear
embedding. In this work, we illustrate how this non-linear latent embedding can
be used as a collective variable for enhanced sampling, and present a simple
modification that allows us to rapidly perform sampling in multiple related
systems. We first demonstrate our method is able to describe the effects of
force field changes in capped alanine dipeptide after learning a model using
AMBER99. We further provide a simple extension to variational dynamics encoders
that allows the model to be trained in a more efficient manner on larger
systems by encoding the outputs of a linear transformation using time-structure
based independent component analysis (tICA). Using this technique, we show how
such a model trained for one protein, the WW domain, can efficiently be
transferred to perform enhanced sampling on a related mutant protein, the GTT
mutation. This method shows promise for its ability to rapidly sample related
systems using a single transferable collective variable and is generally
applicable to sets of related simulations, enabling us to probe the effects of
variation in increasingly large systems of biophysical interest.Comment: 20 pages, 10 figure
A correspondence between solution-state dynamics of an individual protein and the sequence and conformational diversity of its family.
Conformational ensembles are increasingly recognized as a useful representation to describe fundamental relationships between protein structure, dynamics and function. Here we present an ensemble of ubiquitin in solution that is created by sampling conformational space without experimental information using "Backrub" motions inspired by alternative conformations observed in sub-Angstrom resolution crystal structures. Backrub-generated structures are then selected to produce an ensemble that optimizes agreement with nuclear magnetic resonance (NMR) Residual Dipolar Couplings (RDCs). Using this ensemble, we probe two proposed relationships between properties of protein ensembles: (i) a link between native-state dynamics and the conformational heterogeneity observed in crystal structures, and (ii) a relation between dynamics of an individual protein and the conformational variability explored by its natural family. We show that the Backrub motional mechanism can simultaneously explore protein native-state dynamics measured by RDCs, encompass the conformational variability present in ubiquitin complex structures and facilitate sampling of conformational and sequence variability matching those occurring in the ubiquitin protein family. Our results thus support an overall relation between protein dynamics and conformational changes enabling sequence changes in evolution. More practically, the presented method can be applied to improve protein design predictions by accounting for intrinsic native-state dynamics
Protein folding tames chaos
Protein folding produces characteristic and functional three-dimensional
structures from unfolded polypeptides or disordered coils. The emergence of
extraordinary complexity in the protein folding process poses astonishing
challenges to theoretical modeling and computer simulations. The present work
introduces molecular nonlinear dynamics (MND), or molecular chaotic dynamics,
as a theoretical framework for describing and analyzing protein folding. We
unveil the existence of intrinsically low dimensional manifolds (ILDMs) in the
chaotic dynamics of folded proteins. Additionally, we reveal that the
transition from disordered to ordered conformations in protein folding
increases the transverse stability of the ILDM. Stated differently, protein
folding reduces the chaoticity of the nonlinear dynamical system, and a folded
protein has the best ability to tame chaos. Additionally, we bring to light the
connection between the ILDM stability and the thermodynamic stability, which
enables us to quantify the disorderliness and relative energies of folded,
misfolded and unfolded protein states. Finally, we exploit chaos for protein
flexibility analysis and develop a robust chaotic algorithm for the prediction
of Debye-Waller factors, or temperature factors, of protein structures
Multiscale virtual particle based elastic network model (MVP-ENM) for biomolecular normal mode analysis
In this paper, a multiscale virtual particle based elastic network model
(MVP-ENM) is proposed for biomolecular normal mode analysis. The multiscale
virtual particle model is proposed for the discretization of biomolecular
density data in different scales. Essentially, the model works as the
coarse-graining of the biomolecular structure, so that a delicate balance
between biomolecular geometric representation and computational cost can be
achieved. To form "connections" between these multiscale virtual particles, a
new harmonic potential function, which considers the influence from both mass
distributions and distance relations, is adopted between any two virtual
particles. Unlike the previous ENMs that use a constant spring constant, a
particle-dependent spring parameter is used in MVP-ENM. Two independent models,
i.e., multiscale virtual particle based Gaussian network model (MVP-GNM) and
multiscale virtual particle based anisotropic network model (MVP-ANM), are
proposed. Even with a rather coarse grid and a low resolution, the MVP-GNM is
able to predict the Debye-Waller factors (B-factors) with considerable good
accuracy. Similar properties have also been observed in MVP-ANM. More
importantly, in B-factor predictions, the mismatch between the predicted
results and experimental ones is predominantly from higher fluctuation regions.
Further, it is found that MVP-ANM can deliver a very consistent low-frequency
eigenmodes in various scales. This demonstrates the great potential of MVP-ANM
in the deformation analysis of low resolution data. With the multiscale
rigidity function, the MVP-ENM can be applied to biomolecular data represented
in density distribution and atomic coordinates. Further, the great advantage of
my MVP-ENM model in computational cost has been demonstrated by using two
poliovirus virus structures. Finally, the paper ends with a conclusion.Comment: 15 figures; 25 page
Evolution of sparsity and modularity in a model of protein allostery
The sequence of a protein is not only constrained by its physical and
biochemical properties under current selection, but also by features of its
past evolutionary history. Understanding the extent and the form that these
evolutionary constraints may take is important to interpret the information in
protein sequences. To study this problem, we introduce a simple but physical
model of protein evolution where selection targets allostery, the functional
coupling of distal sites on protein surfaces. This model shows how the
geometrical organization of couplings between amino acids within a protein
structure can depend crucially on its evolutionary history. In particular, two
scenarios are found to generate a spatial concentration of functional
constraints: high mutation rates and fluctuating selective pressures. This
second scenario offers a plausible explanation for the high tolerance of
natural proteins to mutations and for the spatial organization of their least
tolerant amino acids, as revealed by sequence analyses and mutagenesis
experiments. It also implies a faculty to adapt to new selective pressures that
is consistent with observations. Besides, the model illustrates how several
independent functional modules may emerge within a same protein structure,
depending on the nature of past environmental fluctuations. Our model thus
relates the evolutionary history and evolutionary potential of proteins to the
geometry of their functional constraints, with implications for decoding and
engineering protein sequences
Network estimation in State Space Model with L1-regularization constraint
Biological networks have arisen as an attractive paradigm of genomic science
ever since the introduction of large scale genomic technologies which carried
the promise of elucidating the relationship in functional genomics. Microarray
technologies coupled with appropriate mathematical or statistical models have
made it possible to identify dynamic regulatory networks or to measure time
course of the expression level of many genes simultaneously. However one of the
few limitations fall on the high-dimensional nature of such data coupled with
the fact that these gene expression data are known to include some hidden
process. In that regards, we are concerned with deriving a method for inferring
a sparse dynamic network in a high dimensional data setting. We assume that the
observations are noisy measurements of gene expression in the form of mRNAs,
whose dynamics can be described by some unknown or hidden process. We build an
input-dependent linear state space model from these hidden states and
demonstrate how an incorporated regularization constraint in an
Expectation-Maximization (EM) algorithm can be used to reverse engineer
transcriptional networks from gene expression profiling data. This corresponds
to estimating the model interaction parameters. The proposed method is
illustrated on time-course microarray data obtained from a well established
T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4,
CASP4, CD69, and C3X1 to have higher number of inwards directed connections and
FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed
connections. We recommend these genes to be object for further investigation.
Caspase 4 is also found to activate the expression of JunD which in turn
represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359
Buried and accessible surface area control intrinsic protein flexibility
Proteins experience a wide variety of conformational dynamics that can be
crucial for facilitating their diverse functions. How is the intrinsic
flexibility required for these motions encoded in their three-dimensional
structures? Here, the overall flexibility of a protein is demonstrated to be
tightly coupled to the total amount of surface area buried within its fold. A
simple proxy for this, the relative solvent accessible surface area (Arel),
therefore shows excellent agreement with independent measures of global protein
flexibility derived from various experimental and computational methods.
Application of Arel on a large scale demonstrates its utility by revealing
unique sequence and structural properties associated with intrinsic
flexibility. In particular, flexibility as measured by Arel shows little
correspondence with intrinsic disorder, but instead tends to be associated with
multiple domains and increased {\alpha}- helical structure. Furthermore, the
apparent flexibility of monomeric proteins is found to be useful for
identifying quaternary structure errors in published crystal structures. There
is also a strong tendency for the crystal structures of more flexible proteins
to be solved to lower resolutions. Finally, local solvent accessibility is
shown to be a primary determinant of local residue flexibility. Overall this
work provides both fundamental mechanistic insight into the origin of protein
flexibility and a simple, practical method for predicting flexibility from
protein structures.Comment: 36 pages, 11 figures, author's manuscript, accepted for publication
in Journal of Molecular Biolog
- …