5 research outputs found
Optimizing weighted ensemble sampling of steady states
We propose parameter optimization techniques for weighted ensemble sampling
of Markov chains in the steady-state regime. Weighted ensemble consists of
replicas of a Markov chain, each carrying a weight, that are periodically
resampled according to their weights inside of each of a number of bins that
partition state space. We derive, from first principles, strategies for
optimizing the choices of weighted ensemble parameters, in particular the
choice of bins and the number of replicas to maintain in each bin. In a simple
numerical example, we compare our new strategies with more traditional ones and
with direct Monte Carlo.Comment: 28 pages, 5 figure
Galerkin Approximation of Dynamical Quantities using Trajectory Data
Understanding chemical mechanisms requires estimating dynamical statistics
such as expected hitting times, reaction rates, and committors. Here, we
present a general framework for calculating these dynamical quantities by
approximating boundary value problems using dynamical operators with a Galerkin
expansion. A specific choice of basis set in the expansion corresponds to
estimation of dynamical quantities using a Markov state model. More generally,
the boundary conditions impose restrictions on the choice of basis sets. We
demonstrate how an alternative basis can be constructed using ideas from
diffusion maps. In our numerical experiments, this basis gives results of
comparable or better accuracy to Markov state models. Additionally, we show
that delay embedding can reduce the information lost when projecting the
system's dynamics for model construction; this improves estimates of dynamical
statistics considerably over the standard practice of increasing the lag time
A splitting method to reduce MCMC variance
We explore whether splitting and killing methods can improve the accuracy of
Markov chain Monte Carlo (MCMC) estimates of rare event probabilities, and we
make three contributions. First, we prove that "weighted ensemble" is the only
splitting and killing method that provides asymptotically consistent estimates
when combined with MCMC. Second, we prove a lower bound on the asymptotic
variance of weighted ensemble's estimates. Third, we give a constructive proof
and numerical examples to show that weighted ensemble can approach this optimal
variance bound, in many cases reducing the variance of MCMC estimates by
multiple orders of magnitude.Comment: 30 pages, 9 figure
An ergodic theorem for weighted ensemble
We prove an ergodic theorem for weighted ensemble, an interacting particle
method for sampling distributions associated with a generic Markov chain.
Because the interactions arise from resampling, weighted ensemble can be viewed
as a sequential Monte Carlo method. In weighted ensemble, the resampling is
based on dividing the particles among a collection of bins, and then copying or
killing to enforce a prescribed number of particles in each bin. We show that
the ergodic theorem is sensitive to the resampling mechanism: indeed it fails
for a large class of related sequential Monte Carlo methods, due to an
accumulating resampling variance. We compare weighted ensemble with one of
these methods, and with direct Monte Carlo, in numerical examples.Comment: 53 pages, 7 figure
Long-timescale predictions from short-trajectory data: A benchmark analysis of the trp-cage miniprotein
Elucidating physical mechanisms with statistical confidence from molecular
dynamics simulations can be challenging owing to the many degrees of freedom
that contribute to collective motions. To address this issue, we recently
introduced a dynamical Galerkin approximation (DGA) [Thiede et al. J. Phys.
Chem. 150, 244111 (2019)], in which chemical kinetic statistics that satisfy
equations of dynamical operators are represented by a basis expansion. Here, we
reformulate this approach, clarifying (and reducing) the dependence on the
choice of lag time. We present a new projection of the reactive current onto
collective variables and provide improved estimators for rates and committors.
We also present simple procedures for constructing suitable smoothly varying
basis functions from arbitrary molecular features. To evaluate estimators and
basis sets numerically, we generate and carefully validate a dataset of short
trajectories for the unfolding and folding of the trp-cage miniprotein, a
well-studied system. Our analysis demonstrates a comprehensive strategy for
characterizing reaction pathways quantitatively.Comment: 61 pages, 17 figure