563 research outputs found
Simultaneous Coherent Structure Coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity
The clustering of data into physically meaningful subsets often requires
assumptions regarding the number, size, or shape of the subgroups. Here, we
present a new method, simultaneous coherent structure coloring (sCSC), which
accomplishes the task of unsupervised clustering without a priori guidance
regarding the underlying structure of the data. sCSC performs a sequence of
binary splittings on the dataset such that the most dissimilar data points are
required to be in separate clusters. To achieve this, we obtain a set of
orthogonal coordinates along which dissimilarity in the dataset is maximized
from a generalized eigenvalue problem based on the pairwise dissimilarity
between the data points to be clustered. This sequence of bifurcations produces
a binary tree representation of the system, from which the number of clusters
in the data and their interrelationships naturally emerge. To illustrate the
effectiveness of the method in the absence of a priori assumptions, we apply it
to three exemplary problems in fluid dynamics. Then, we illustrate its capacity
for interpretability using a high-dimensional protein folding simulation
dataset. While we restrict our examples to dynamical physical systems in this
work, we anticipate straightforward translation to other fields where existing
analysis tools require ad hoc assumptions on the data structure, lack the
interpretability of the present method, or in which the underlying processes
are less accessible, such as genomics and neuroscience
Bayesian selection for coarse-grained models of liquid water
The necessity for accurate and computationally efficient representations of
water in atomistic simulations that can span biologically relevant timescales
has born the necessity of coarse-grained (CG) modeling. Despite numerous
advances, CG water models rely mostly on a-priori specified assumptions. How
these assumptions affect the model accuracy, efficiency, and in particular
transferability, has not been systematically investigated. Here we propose a
data driven, comparison and selection for CG water models through a
Hierarchical Bayesian framework. We examine CG water models that differ in
their level of coarse-graining, structure, and number of interaction sites. We
find that the importance of electrostatic interactions for the physical system
under consideration is a dominant criterion for the model selection. Multi-site
models are favored, unless the effects of water in electrostatic screening are
not relevant, in which case the single site model is preferred due to its
computational savings. The charge distribution is found to play an important
role in the multi-site model's accuracy while the flexibility of the
bonds/angles may only slightly improve the models. Furthermore, we find
significant variations in the computational cost of these models. We present a
data informed rationale for the selection of CG water models and provide
guidance for future water model designs
Deep learning of the dynamics of complex systems with its applications to biochemical molecules
Recent advancements in deep learning have revolutionized method development in several scientific fields and beyond. One central application is the extraction of equilibrium structures and long- timescale kinetics from molecular dynamics simulations, i.e. the well-known sampling problem. Previous state-of-the art methods employed a multi-step handcrafted data processing pipeline resulting in Markov state models (MSM), which can be understood as an approximation of the underlying Koopman operator. However, this approach demands choosing a set of features characterizing the molecular structure, methods and their parameters for dimension reduction to collective variables and clustering, and estimation strategies for MSMs throughout the processing pipeline. As this requires specific expertise, the approach is ultimately inaccessible to a broader community.
In this thesis we apply deep learning techniques to approximate the Koopman operator in an end-to-end learning framework by employing the variational approach for Markov processes (VAMP). Thereby, the framework bypasses the multi-step process and automates the pipeline while yielding a model similar to a coarse-grained MSM. We further transfer advanced techniques from the MSM field to the deep learning framework, making it possible to (i) include experimental evidence into the model estimation, (ii) enforce reversibility, and (iii) perform coarse-graining. At this stage, post-analysis tools from MSMs can be borrowed to estimate rates of relevant rare events. Finally, we extend this approach to decompose a system into its (almost) independent subsystems and simultaneously estimate dynamical models for each of them, making it much more data efficient and enabling applications to larger proteins.
Although our results solely focus on protein dynamics, the application to climate, weather, and ocean currents data is an intriguing possibility with potential to yield new insights and improve predictive power in these fields
Entropy production and coarse-graining in Markov processes
We study the large time fluctuations of entropy production in Markov
processes. In particular, we consider the effect of a coarse-graining procedure
which decimates {\em fast states} with respect to a given time threshold. Our
results provide strong evidence that entropy production is not directly
affected by this decimation, provided that it does not entirely remove loops
carrying a net probability current. After the study of some examples of random
walks on simple graphs, we apply our analysis to a network model for the
kinesin cycle, which is an important biomolecular motor. A tentative general
theory of these facts, based on Schnakenberg's network theory, is proposed.Comment: 18 pages, 13 figures, submitted for publicatio
Entropy production and coarse-graining in Markov processes
We study the large time fluctuations of entropy production in Markov
processes. In particular, we consider the effect of a coarse-graining procedure
which decimates {\em fast states} with respect to a given time threshold. Our
results provide strong evidence that entropy production is not directly
affected by this decimation, provided that it does not entirely remove loops
carrying a net probability current. After the study of some examples of random
walks on simple graphs, we apply our analysis to a network model for the
kinesin cycle, which is an important biomolecular motor. A tentative general
theory of these facts, based on Schnakenberg's network theory, is proposed.Comment: 18 pages, 13 figures, submitted for publicatio
- …