5 research outputs found

    Optimizing weighted ensemble sampling of steady states

    Full text link
    We propose parameter optimization techniques for weighted ensemble sampling of Markov chains in the steady-state regime. Weighted ensemble consists of replicas of a Markov chain, each carrying a weight, that are periodically resampled according to their weights inside of each of a number of bins that partition state space. We derive, from first principles, strategies for optimizing the choices of weighted ensemble parameters, in particular the choice of bins and the number of replicas to maintain in each bin. In a simple numerical example, we compare our new strategies with more traditional ones and with direct Monte Carlo.Comment: 28 pages, 5 figure

    Galerkin Approximation of Dynamical Quantities using Trajectory Data

    Full text link
    Understanding chemical mechanisms requires estimating dynamical statistics such as expected hitting times, reaction rates, and committors. Here, we present a general framework for calculating these dynamical quantities by approximating boundary value problems using dynamical operators with a Galerkin expansion. A specific choice of basis set in the expansion corresponds to estimation of dynamical quantities using a Markov state model. More generally, the boundary conditions impose restrictions on the choice of basis sets. We demonstrate how an alternative basis can be constructed using ideas from diffusion maps. In our numerical experiments, this basis gives results of comparable or better accuracy to Markov state models. Additionally, we show that delay embedding can reduce the information lost when projecting the system's dynamics for model construction; this improves estimates of dynamical statistics considerably over the standard practice of increasing the lag time

    A splitting method to reduce MCMC variance

    Full text link
    We explore whether splitting and killing methods can improve the accuracy of Markov chain Monte Carlo (MCMC) estimates of rare event probabilities, and we make three contributions. First, we prove that "weighted ensemble" is the only splitting and killing method that provides asymptotically consistent estimates when combined with MCMC. Second, we prove a lower bound on the asymptotic variance of weighted ensemble's estimates. Third, we give a constructive proof and numerical examples to show that weighted ensemble can approach this optimal variance bound, in many cases reducing the variance of MCMC estimates by multiple orders of magnitude.Comment: 30 pages, 9 figure

    An ergodic theorem for weighted ensemble

    Full text link
    We prove an ergodic theorem for weighted ensemble, an interacting particle method for sampling distributions associated with a generic Markov chain. Because the interactions arise from resampling, weighted ensemble can be viewed as a sequential Monte Carlo method. In weighted ensemble, the resampling is based on dividing the particles among a collection of bins, and then copying or killing to enforce a prescribed number of particles in each bin. We show that the ergodic theorem is sensitive to the resampling mechanism: indeed it fails for a large class of related sequential Monte Carlo methods, due to an accumulating resampling variance. We compare weighted ensemble with one of these methods, and with direct Monte Carlo, in numerical examples.Comment: 53 pages, 7 figure

    Long-timescale predictions from short-trajectory data: A benchmark analysis of the trp-cage miniprotein

    Full text link
    Elucidating physical mechanisms with statistical confidence from molecular dynamics simulations can be challenging owing to the many degrees of freedom that contribute to collective motions. To address this issue, we recently introduced a dynamical Galerkin approximation (DGA) [Thiede et al. J. Phys. Chem. 150, 244111 (2019)], in which chemical kinetic statistics that satisfy equations of dynamical operators are represented by a basis expansion. Here, we reformulate this approach, clarifying (and reducing) the dependence on the choice of lag time. We present a new projection of the reactive current onto collective variables and provide improved estimators for rates and committors. We also present simple procedures for constructing suitable smoothly varying basis functions from arbitrary molecular features. To evaluate estimators and basis sets numerically, we generate and carefully validate a dataset of short trajectories for the unfolding and folding of the trp-cage miniprotein, a well-studied system. Our analysis demonstrates a comprehensive strategy for characterizing reaction pathways quantitatively.Comment: 61 pages, 17 figure
    corecore