75 research outputs found
Variational approach for learning Markov processes from time series data
Inference, prediction and control of complex dynamical systems from time
series is important in many areas, including financial markets, power grid
management, climate and weather modeling, or molecular dynamics. The analysis
of such highly nonlinear dynamical systems is facilitated by the fact that we
can often find a (generally nonlinear) transformation of the system coordinates
to features in which the dynamics can be excellently approximated by a linear
Markovian model. Moreover, the large number of system variables often change
collectively on large time- and length-scales, facilitating a low-dimensional
analysis in feature space. In this paper, we introduce a variational approach
for Markov processes (VAMP) that allows us to find optimal feature mappings and
optimal Markovian models of the dynamics from given time series data. The key
insight is that the best linear model can be obtained from the top singular
components of the Koopman operator. This leads to the definition of a family of
score functions called VAMP-r which can be calculated from data, and can be
employed to optimize a Markovian model. In addition, based on the relationship
between the variational scores and approximation errors of Koopman operators,
we propose a new VAMP-E score, which can be applied to cross-validation for
hyper-parameter optimization and model selection in VAMP. VAMP is valid for
both reversible and nonreversible processes and for stationary and
non-stationary processes or realizations
Ergodicity of the zigzag process
The zigzag process is a Piecewise Deterministic Markov Process which can be
used in a MCMC framework to sample from a given target distribution. We prove
the convergence of this process to its target under very weak assumptions, and
establish a central limit theorem for empirical averages under stronger
assumptions on the decay of the target measure. We use the classical
"Meyn-Tweedie" approach. The main difficulty turns out to be the proof that the
process can indeed reach all the points in the space, even if we consider the
minimal switching rates
Markov State Models: To Optimize or Not to Optimize
Markov state models (MSM) are a popular statistical method for analyzing the conformational dynamics of proteins including protein folding. With all statistical and machine learning (ML) models, choices must be made about the modeling pipeline that cannot be directly learned from the data. These choices, or hyperparameters, are often evaluated by expert judgment or, in the case of MSMs, by maximizing variational scores such as the VAMP-2 score. Modern ML and statistical pipelines often use automatic hyperparameter selection techniques ranging from the simple, choosing the best score from a random selection of hyperparameters, to the complex, optimization via, e.g., Bayesian optimization. In this work, we ask whether it is possible to automatically select MSM models this way by estimating and analyzing over 16,000,000 observations from over 280,000 estimated MSMs. We find that differences in hyperparameters can change the physical interpretation of the optimization objective, making automatic selection difficult. In addition, we find that enforcing conditions of equilibrium in the VAMP scores can result in inconsistent model selection. However, other parameters that specify the VAMP-2 score (lag time and number of relaxation processes scored) have only a negligible influence on model selection. We suggest that model observables and variational scores should be only a guide to model selection and that a full investigation of the MSM properties should be undertaken when selecting hyperparameters.</p
Variational Selection of Features for Molecular Kinetics
The modeling of atomistic biomolecular simulations using kinetic models such as Markov state models (MSMs) has had many notable algorithmic advances in recent years. The variational principle has opened the door for a nearly fully automated toolkit for selecting models that predict the long-time kinetics from molecular dynamics simulations. However, one yet-unoptimized step of the pipeline involves choosing the features, or collective variables, from which the model should be constructed. In order to build intuitive models, these collective variables are often sought to be interpretable and familiar features, such as torsional angles or contact distances in a protein structure. However, previous approaches for evaluating the chosen features rely on constructing a full MSM, which in turn requires additional hyperparameters to be chosen, and hence leads to a computationally expensive framework. Here, we present a method to optimize the feature choice directly, without requiring the construction of the final kinetic model. We demonstrate our rigorous preprocessing algorithm on a canonical set of twelve fast-folding protein simulations, and show that our procedure leads to more efficient model selection
Recommended from our members
Statistical analysis of tipping pathways in agent-based models
Agent-based models are a natural choice for modeling complex social systems. In such models simple stochastic interaction rules for a large population of individuals on the microscopic scale can lead to emergent dynamics on the macroscopic scale, for instance a sudden shift of majority opinion or behavior. Here we are introducing a methodology for studying noise-induced tipping between relevant subsets of the agent state space representing characteristic configurations. Due to a large number of interacting individuals, agent-based models are high-dimensional, though usually a lower-dimensional structure of the emerging collective behaviour exists. We therefore apply Diffusion Maps, a non-linear dimension reduction technique, to reveal the intrinsic low-dimensional structure. We characterize the tipping behaviour by means of Transition Path Theory, which helps gaining a statistical understanding of the tipping paths such as their distribution, flux and rate. By systematically studying two agent-based models that exhibit a multitude of tipping pathways and cascading effects, we illustrate the practicability of our approach
Maximum Margin Clustering for State Decomposition of Metastable Systems
When studying a metastable dynamical system, a prime concern is how to
decompose the phase space into a set of metastable states. Unfortunately, the
metastable state decomposition based on simulation or experimental data is
still a challenge. The most popular and simplest approach is geometric
clustering which is developed based on the classical clustering technique.
However, the prerequisites of this approach are: (1) data are obtained from
simulations or experiments which are in global equilibrium and (2) the
coordinate system is appropriately selected. Recently, the kinetic clustering
approach based on phase space discretization and transition probability
estimation has drawn much attention due to its applicability to more general
cases, but the choice of discretization policy is a difficult task. In this
paper, a new decomposition method designated as maximum margin metastable
clustering is proposed, which converts the problem of metastable state
decomposition to a semi-supervised learning problem so that the large margin
technique can be utilized to search for the optimal decomposition without phase
space discretization. Moreover, several simulation examples are given to
illustrate the effectiveness of the proposed method
Generation and validation
Markov state models of molecular kinetics (MSMs), in which the long-time
statistical dynamics of a molecule is approximated by a Markov chain on a
discrete partition of configuration space, have seen widespread use in recent
years. This approach has many appealing characteristics compared to
straightforward molecular dynamics simulation and analysis, including the
potential to mitigate the sampling problem by extracting long-time kinetic
information from short trajectories and the ability to straightforwardly
calculate expectation values and statistical uncertainties of various
stationary and dynamical molecular observables. In this paper, we summarize
the current state of the art in generation and validation of MSMs and give
some important new results. We describe an upper bound for the approximation
error made by modelingmolecular dynamics with a MSM and we show that this
error can be made arbitrarily small with surprisingly little effort. In
contrast to previous practice, it becomes clear that the best MSM is not
obtained by the most metastable discretization, but the MSM can be much
improved if non-metastable states are introduced near the transition states.
Moreover, we show that it is not necessary to resolve all slow processes by
the state space partitioning, but individual dynamical processes of interest
can be resolved separately. We also present an efficient estimator for
reversible transition matrices and a robust test to validate that a MSM
reproduces the kinetics of the molecular dynamics data
- …