563 research outputs found

    Simultaneous Coherent Structure Coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity

    Get PDF
    The clustering of data into physically meaningful subsets often requires assumptions regarding the number, size, or shape of the subgroups. Here, we present a new method, simultaneous coherent structure coloring (sCSC), which accomplishes the task of unsupervised clustering without a priori guidance regarding the underlying structure of the data. sCSC performs a sequence of binary splittings on the dataset such that the most dissimilar data points are required to be in separate clusters. To achieve this, we obtain a set of orthogonal coordinates along which dissimilarity in the dataset is maximized from a generalized eigenvalue problem based on the pairwise dissimilarity between the data points to be clustered. This sequence of bifurcations produces a binary tree representation of the system, from which the number of clusters in the data and their interrelationships naturally emerge. To illustrate the effectiveness of the method in the absence of a priori assumptions, we apply it to three exemplary problems in fluid dynamics. Then, we illustrate its capacity for interpretability using a high-dimensional protein folding simulation dataset. While we restrict our examples to dynamical physical systems in this work, we anticipate straightforward translation to other fields where existing analysis tools require ad hoc assumptions on the data structure, lack the interpretability of the present method, or in which the underlying processes are less accessible, such as genomics and neuroscience

    Bayesian selection for coarse-grained models of liquid water

    Full text link
    The necessity for accurate and computationally efficient representations of water in atomistic simulations that can span biologically relevant timescales has born the necessity of coarse-grained (CG) modeling. Despite numerous advances, CG water models rely mostly on a-priori specified assumptions. How these assumptions affect the model accuracy, efficiency, and in particular transferability, has not been systematically investigated. Here we propose a data driven, comparison and selection for CG water models through a Hierarchical Bayesian framework. We examine CG water models that differ in their level of coarse-graining, structure, and number of interaction sites. We find that the importance of electrostatic interactions for the physical system under consideration is a dominant criterion for the model selection. Multi-site models are favored, unless the effects of water in electrostatic screening are not relevant, in which case the single site model is preferred due to its computational savings. The charge distribution is found to play an important role in the multi-site model's accuracy while the flexibility of the bonds/angles may only slightly improve the models. Furthermore, we find significant variations in the computational cost of these models. We present a data informed rationale for the selection of CG water models and provide guidance for future water model designs

    Deep learning of the dynamics of complex systems with its applications to biochemical molecules

    Get PDF
    Recent advancements in deep learning have revolutionized method development in several scientific fields and beyond. One central application is the extraction of equilibrium structures and long- timescale kinetics from molecular dynamics simulations, i.e. the well-known sampling problem. Previous state-of-the art methods employed a multi-step handcrafted data processing pipeline resulting in Markov state models (MSM), which can be understood as an approximation of the underlying Koopman operator. However, this approach demands choosing a set of features characterizing the molecular structure, methods and their parameters for dimension reduction to collective variables and clustering, and estimation strategies for MSMs throughout the processing pipeline. As this requires specific expertise, the approach is ultimately inaccessible to a broader community. In this thesis we apply deep learning techniques to approximate the Koopman operator in an end-to-end learning framework by employing the variational approach for Markov processes (VAMP). Thereby, the framework bypasses the multi-step process and automates the pipeline while yielding a model similar to a coarse-grained MSM. We further transfer advanced techniques from the MSM field to the deep learning framework, making it possible to (i) include experimental evidence into the model estimation, (ii) enforce reversibility, and (iii) perform coarse-graining. At this stage, post-analysis tools from MSMs can be borrowed to estimate rates of relevant rare events. Finally, we extend this approach to decompose a system into its (almost) independent subsystems and simultaneously estimate dynamical models for each of them, making it much more data efficient and enabling applications to larger proteins. Although our results solely focus on protein dynamics, the application to climate, weather, and ocean currents data is an intriguing possibility with potential to yield new insights and improve predictive power in these fields

    Entropy production and coarse-graining in Markov processes

    Full text link
    We study the large time fluctuations of entropy production in Markov processes. In particular, we consider the effect of a coarse-graining procedure which decimates {\em fast states} with respect to a given time threshold. Our results provide strong evidence that entropy production is not directly affected by this decimation, provided that it does not entirely remove loops carrying a net probability current. After the study of some examples of random walks on simple graphs, we apply our analysis to a network model for the kinesin cycle, which is an important biomolecular motor. A tentative general theory of these facts, based on Schnakenberg's network theory, is proposed.Comment: 18 pages, 13 figures, submitted for publicatio

    Entropy production and coarse-graining in Markov processes

    Get PDF
    We study the large time fluctuations of entropy production in Markov processes. In particular, we consider the effect of a coarse-graining procedure which decimates {\em fast states} with respect to a given time threshold. Our results provide strong evidence that entropy production is not directly affected by this decimation, provided that it does not entirely remove loops carrying a net probability current. After the study of some examples of random walks on simple graphs, we apply our analysis to a network model for the kinesin cycle, which is an important biomolecular motor. A tentative general theory of these facts, based on Schnakenberg's network theory, is proposed.Comment: 18 pages, 13 figures, submitted for publicatio
    corecore