803 research outputs found
Statistically optimal analysis of samples from multiple equilibrium states
We present a new estimator for computing free energy differences and
thermodynamic expectations as well as their uncertainties from samples obtained
from multiple equilibrium states via either simulation or experiment. The
estimator, which we term the multistate Bennett acceptance ratio (MBAR)
estimator because it reduces to the Bennett acceptance ratio when only two
states are considered, has significant advantages over multiple histogram
reweighting methods for combining data from multiple states. It does not
require the sampled energy range to be discretized to produce histograms,
eliminating bias due to energy binning and significantly reducing the time
complexity of computing a solution to the estimating equations in many cases.
Additionally, an estimate of the statistical uncertainty is provided for all
estimated quantities. In the large sample limit, MBAR is unbiased and has the
lowest variance of any known estimator for making use of equilibrium data
collected from multiple states. We illustrate this method by producing a highly
precise estimate of the potential of mean force for a DNA hairpin system,
combining data from multiple optical tweezer measurements under constant force
bias.Comment: 13 pages (including appendices), 1 figure, LaTe
Spectral rate theory for projected two-state kinetics
Classical rate theories often fail in cases where the observable(s) or order
parameter(s) used are poor reaction coordinates or the observed signal is
deteriorated by noise, such that no clear separation between reactants and
products is possible. Here, we present a general spectral two-state rate theory
for ergodic dynamical systems in thermal equilibrium that explicitly takes into
account how the system is observed. The theory allows the systematic estimation
errors made by standard rate theories to be understood and quantified. We also
elucidate the connection of spectral rate theory with the popular Markov state
modeling (MSM) approach for molecular simulation studies. An optimal rate
estimator is formulated that gives robust and unbiased results even for poor
reaction coordinates and can be applied to both computer simulations and
single-molecule experiments. No definition of a dividing surface is required.
Another result of the theory is a model-free definition of the reaction
coordinate quality (RCQ). The RCQ can be bounded from below by the directly
computable observation quality (OQ), thus providing a measure allowing the RCQ
to be optimized by tuning the experimental setup. Additionally, the respective
partial probability distributions can be obtained for the reactant and product
states along the observed order parameter, even when these strongly overlap.
The effects of both filtering (averaging) and uncorrelated noise are also
examined. The approach is demonstrated on numerical examples and experimental
single-molecule force probe data of the p5ab RNA hairpin and the apo-myoglobin
protein at low pH, here focusing on the case of two-state kinetics
Time step rescaling recovers continuous-time dynamical properties for discrete-time Langevin integration of nonequilibrium systems
When simulating molecular systems using deterministic equations of motion
(e.g., Newtonian dynamics), such equations are generally numerically integrated
according to a well-developed set of algorithms that share commonly agreed-upon
desirable properties. However, for stochastic equations of motion (e.g.,
Langevin dynamics), there is still broad disagreement over which integration
algorithms are most appropriate. While multiple desiderata have been proposed
throughout the literature, consensus on which criteria are important is absent,
and no published integration scheme satisfies all desiderata simultaneously.
Additional nontrivial complications stem from simulating systems driven out of
equilibrium using existing stochastic integration schemes in conjunction with
recently-developed nonequilibrium fluctuation theorems. Here, we examine a
family of discrete time integration schemes for Langevin dynamics, assessing
how each member satisfies a variety of desiderata that have been enumerated in
prior efforts to construct suitable Langevin integrators. We show that the
incorporation of a novel time step rescaling in the deterministic updates of
position and velocity can correct a number of dynamical defects in these
integrators. Finally, we identify a particular splitting that has essentially
universally appropriate properties for the simulation of Langevin dynamics for
molecular systems in equilibrium, nonequilibrium, and path sampling contexts.Comment: 15 pages, 2 figures, and 2 table
Splitting probabilities as a test of reaction coordinate choice in single-molecule experiments
To explain the observed dynamics in equilibrium single-molecule measurements
of biomolecules, the experimental observable is often chosen as a putative
reaction coordinate along which kinetic behavior is presumed to be governed by
diffusive dynamics. Here, we invoke the splitting probability as a test of the
suitability of such a proposed reaction coordinate. Comparison of the observed
splitting probability with that computed from the kinetic model provides a
simple test to reject poor reaction coordinates. We demonstrate this test for a
force spectroscopy measurement of a DNA hairpin
Markov state models of biomolecular conformational dynamics
It has recently become practical to construct Markov state models (MSMs) that reproduce the long-time statistical conformational dynamics of biomolecules using data from molecular dynamics simulations. MSMs can predict both stationary and kinetic quantities on long timescales (e.g. milliseconds) using a set of atomistic molecular dynamics simulations that are individually much shorter, thus addressing the well-known sampling problem in molecular dynamics simulation. In addition to providing predictive quantitative models, MSMs greatly facilitate both the extraction of insight into biomolecular mechanism (such as folding and functional dynamics) and quantitative comparison with single-molecule and ensemble kinetics experiments. A variety of methodological advances and software packages now bring the construction of these models closer to routine practice. Here, we review recent progress in this field, considering theoretical and methodological advances, new software tools, and recent applications of these approaches in several domains of biochemistry and biophysics, commenting on remaining challenges
Probability distributions of molecular observables computed from Markov models. II: Uncertainties in observables and their time-evolution
Discrete-state Markov (or master equation) models provide a useful simplified representation for characterizing the long-time statistical evolution of biomolecules in a manner that allows direct comparison with experiments as well as the elucidation of mechanistic pathways for an inherently stochastic process. A vital part of meaningful comparison with experiment is the characterization of the statistical uncertainty in the predicted experimental measurement, which may take the form of an equilibrium measurement of some spectroscopic signal, the time-evolution of this signal following a perturbation, or the observation of some statistic (such as the correlation function) of the equilibrium dynamics of a single molecule. Without meaningful error bars (which arise from both approximation and statistical error), there is no way to determine whether the deviations between model and experiment are statistically meaningful. Previous work has demonstrated that a Bayesian method that enforces microscopic reversibility can be used to characterize the statistical component of correlated uncertainties in state-to-state transition probabilities (and functions thereof) for a model inferred from molecular simulation data. Here, we extend this approach to include the uncertainty in observables that are functions of molecular conformation (such as surrogate spectroscopic signals) characterizing each state, permitting the full statistical uncertainty in computed spectroscopic experiments to be assessed. We test the approach in a simple model system to demonstrate that the computed uncertainties provide a useful indicator of statistical variation, and then apply it to the computation of the fluorescence autocorrelation function measured for a dye-labeled peptide previously studied by both experiment and simulation
Maximum Margin Clustering for State Decomposition of Metastable Systems
When studying a metastable dynamical system, a prime concern is how to
decompose the phase space into a set of metastable states. Unfortunately, the
metastable state decomposition based on simulation or experimental data is
still a challenge. The most popular and simplest approach is geometric
clustering which is developed based on the classical clustering technique.
However, the prerequisites of this approach are: (1) data are obtained from
simulations or experiments which are in global equilibrium and (2) the
coordinate system is appropriately selected. Recently, the kinetic clustering
approach based on phase space discretization and transition probability
estimation has drawn much attention due to its applicability to more general
cases, but the choice of discretization policy is a difficult task. In this
paper, a new decomposition method designated as maximum margin metastable
clustering is proposed, which converts the problem of metastable state
decomposition to a semi-supervised learning problem so that the large margin
technique can be utilized to search for the optimal decomposition without phase
space discretization. Moreover, several simulation examples are given to
illustrate the effectiveness of the proposed method
Towards Automated Benchmarking of Atomistic Forcefields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive
Atomistic molecular simulations are a powerful way to make quantitative
predictions, but the accuracy of these predictions depends entirely on the
quality of the forcefield employed. While experimental measurements of
fundamental physical properties offer a straightforward approach for evaluating
forcefield quality, the bulk of this information has been tied up in formats
that are not machine-readable. Compiling benchmark datasets of physical
properties from non-machine-readable sources require substantial human effort
and is prone to accumulation of human errors, hindering the development of
reproducible benchmarks of forcefield accuracy. Here, we examine the
feasibility of benchmarking atomistic forcefields against the NIST ThermoML
data archive of physicochemical measurements, which aggregates thousands of
experimental measurements in a portable, machine-readable, self-annotating
format. As a proof of concept, we present a detailed benchmark of the
generalized Amber small molecule forcefield (GAFF) using the AM1-BCC charge
model against measurements (specifically bulk liquid densities and static
dielectric constants at ambient pressure) automatically extracted from the
archive, and discuss the extent of available data. The results of this
benchmark highlight a general problem with fixed-charge forcefields in the
representation low dielectric environments such as those seen in binding
cavities or biological membranes
End-to-End Differentiable Molecular Mechanics Force Field Construction
Molecular mechanics (MM) potentials have long been a workhorse of
computational chemistry. Leveraging accuracy and speed, these functional forms
find use in a wide variety of applications from rapid virtual screening to
detailed free energy calculations. Traditionally, MM potentials have relied on
human-curated, inflexible, and poorly extensible discrete chemical perception
rules (atom types) for applying parameters to molecules or biopolymers, making
them difficult to optimize to fit quantum chemical or physical property data.
Here, we propose an alternative approach that uses graph nets to perceive
chemical environments, producing continuous atom embeddings from which valence
and nonbonded parameters can be predicted using a feed-forward neural network.
Since all stages are built using smooth functions, the entire process of
chemical perception and parameter assignment is differentiable end-to-end with
respect to model parameters, allowing new force fields to be easily
constructed, extended, and applied to arbitrary molecules. We show that this
approach has the capacity to reproduce legacy atom types and can be fit to MM
and QM energies and forces, among other targets
- …