4,045 research outputs found
A motion planning approach to protein folding
Protein folding is considered to be one of the grand challenge problems in biology. Protein folding refers to how a protein's amino acid sequence, under certain physiological conditions, folds into a stable close-packed three-dimensional structure known as the native state. There are two major problems in protein folding. One, usually called protein structure prediction, is to predict the structure of the protein's native state given only the amino acid sequence. Another important and strongly related problem, often called protein folding, is to study how the amino acid sequence dynamically transitions from an unstructured state to the native state. In this dissertation, we concentrate on the second problem. There are several approaches that have been applied to the protein folding problem, including molecular dynamics, Monte Carlo methods, statistical mechanical models, and lattice models. However, most of these approaches suffer from either overly-detailed simulations, requiring impractical computation times, or overly-simplified models, resulting in unrealistic solutions.
In this work, we present a novel motion planning based framework for studying protein folding. We describe how it can be used to approximately map a protein's energy landscape, and then discuss how to find approximate folding pathways and kinetics on this approximate energy landscape. In particular, our technique can produce potential energy landscapes, free energy landscapes, and many folding pathways all from a single roadmap. The roadmap can be computed in a few hours on a desktop PC using a coarse potential energy function. In addition, our motion planning based approach is the first simulation method that enables the study of protein folding kinetics at a level of detail that is appropriate (i.e., not too detailed or too coarse) for capturing possible 2-state and 3-state folding kinetics that may coexist in one protein. Indeed, the unique ability of our method to produce large sets of unrelated folding pathways may potentially provide crucial insight into some aspects of folding kinetics that are not available to other theoretical techniques
Techniques for modeling and analyzing RNA and protein folding energy landscapes
RNA and protein molecules undergo a dynamic folding process that is important
to their function. Computational methods are critical for studying this folding pro-
cess because it is difficult to observe experimentally. In this work, we introduce
new computational techniques to study RNA and protein energy landscapes, includ-
ing a method to approximate an RNA energy landscape with a coarse graph (map)
and new tools for analyzing graph-based approximations of RNA and protein energy
landscapes. These analysis techniques can be used to study RNA and protein fold-
ing kinetics such as population kinetics, folding rates, and the folding of particular
subsequences. In particular, a map-based Master Equation (MME) method can be
used to analyze the population kinetics of the maps, while another map analysis tool,
map-based Monte Carlo (MMC) simulation, can extract stochastic folding pathways
from the map.
To validate the results, I compared our methods with other computational meth-
ods and with experimental studies of RNA and protein. I first compared our MMC
and MME methods for RNA with other computational methods working on the com-
plete energy landscape and show that the approximate map captures the major fea-
tures of a much larger (e.g., by orders of magnitude) complete energy landscape.
Moreover, I show that the methods scale well to large molecules, e.g., RNA with 200+ nucleotides. Then, I correlate the computational results with experimental
findings. I present comparisons with two experimental cases to show how I can pre-
dict kinetics-based functional rates of ColE1 RNAII and MS2 phage RNA and their
mutants using our MME and MMC tools respectively. I also show that the MME
and MMC tools can be applied to map-based approximations of protein energy energy
landscapes and present kinetics analysis results for several proteins
Major Subject: Computer ScienceTECHNIQUES FOR MODELING AND ANALYZING RNA AND PROTEIN FOLDING ENERGY LANDSCAPES
Major Subject: Computer Scienceiii Techniques for Modeling and Analyzing RNA and Protein Folding Energ
The difficulty of folding self-folding origami
Why is it difficult to refold a previously folded sheet of paper? We show
that even crease patterns with only one designed folding motion inevitably
contain an exponential number of `distractor' folding branches accessible from
a bifurcation at the flat state. Consequently, refolding a sheet requires
finding the ground state in a glassy energy landscape with an exponential
number of other attractors of higher energy, much like in models of protein
folding (Levinthal's paradox) and other NP-hard satisfiability (SAT) problems.
As in these problems, we find that refolding a sheet requires actuation at
multiple carefully chosen creases. We show that seeding successful folding in
this way can be understood in terms of sub-patterns that fold when cut out
(`folding islands'). Besides providing guidelines for the placement of active
hinges in origami applications, our results point to fundamental limits on the
programmability of energy landscapes in sheets.Comment: 8 pages, 5 figure
Path Similarity Analysis: a Method for Quantifying Macromolecular Pathways
Diverse classes of proteins function through large-scale conformational
changes; sophisticated enhanced sampling methods have been proposed to generate
these macromolecular transition paths. As such paths are curves in a
high-dimensional space, they have been difficult to compare quantitatively, a
prerequisite to, for instance, assess the quality of different sampling
algorithms. The Path Similarity Analysis (PSA) approach alleviates these
difficulties by utilizing the full information in 3N-dimensional trajectories
in configuration space. PSA employs the Hausdorff or Fr\'echet path
metrics---adopted from computational geometry---enabling us to quantify path
(dis)similarity, while the new concept of a Hausdorff-pair map permits the
extraction of atomic-scale determinants responsible for path differences.
Combined with clustering techniques, PSA facilitates the comparison of many
paths, including collections of transition ensembles. We use the closed-to-open
transition of the enzyme adenylate kinase (AdK)---a commonly used testbed for
the assessment enhanced sampling algorithms---to examine multiple microsecond
equilibrium molecular dynamics (MD) transitions of AdK in its substrate-free
form alongside transition ensembles from the MD-based dynamic importance
sampling (DIMS-MD) and targeted MD (TMD) methods, and a geometrical targeting
algorithm (FRODA). A Hausdorff pairs analysis of these ensembles revealed, for
instance, that differences in DIMS-MD and FRODA paths were mediated by a set of
conserved salt bridges whose charge-charge interactions are fully modeled in
DIMS-MD but not in FRODA. We also demonstrate how existing trajectory analysis
methods relying on pre-defined collective variables, such as native contacts or
geometric quantities, can be used synergistically with PSA, as well as the
application of PSA to more complex systems such as membrane transporter
proteins.Comment: 9 figures, 3 tables in the main manuscript; supplementary information
includes 7 texts (S1 Text - S7 Text) and 11 figures (S1 Fig - S11 Fig) (also
available from journal site
REinforcement learning based Adaptive samPling: REAPing Rewards by Exploring Protein Conformational Landscapes
One of the key limitations of Molecular Dynamics simulations is the
computational intractability of sampling protein conformational landscapes
associated with either large system size or long timescales. To overcome this
bottleneck, we present the REinforcement learning based Adaptive samPling
(REAP) algorithm that aims to efficiently sample conformational space by
learning the relative importance of each reaction coordinate as it samples the
landscape. To achieve this, the algorithm uses concepts from the field of
reinforcement learning, a subset of machine learning, which rewards sampling
along important degrees of freedom and disregards others that do not facilitate
exploration or exploitation. We demonstrate the effectiveness of REAP by
comparing the sampling to long continuous MD simulations and least-counts
adaptive sampling on two model landscapes (L-shaped and circular), and
realistic systems such as alanine dipeptide and Src kinase. In all four
systems, the REAP algorithm consistently demonstrates its ability to explore
conformational space faster than the other two methods when comparing the
expected values of the landscape discovered for a given amount of time. The key
advantage of REAP is on-the-fly estimation of the importance of collective
variables, which makes it particularly useful for systems with limited
structural information
- …