8,180 research outputs found

    Transferable neural networks for enhanced sampling of protein dynamics

    Full text link
    Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable and is generally applicable to sets of related simulations, enabling us to probe the effects of variation in increasingly large systems of biophysical interest.Comment: 20 pages, 10 figure

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

    Full text link
    The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, external covariates, and non-linear interactions between the two. In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We demonstrate the utility of our model on simulated examples and applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. In each setting we show how the c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches

    Protein folding tames chaos

    Full text link
    Protein folding produces characteristic and functional three-dimensional structures from unfolded polypeptides or disordered coils. The emergence of extraordinary complexity in the protein folding process poses astonishing challenges to theoretical modeling and computer simulations. The present work introduces molecular nonlinear dynamics (MND), or molecular chaotic dynamics, as a theoretical framework for describing and analyzing protein folding. We unveil the existence of intrinsically low dimensional manifolds (ILDMs) in the chaotic dynamics of folded proteins. Additionally, we reveal that the transition from disordered to ordered conformations in protein folding increases the transverse stability of the ILDM. Stated differently, protein folding reduces the chaoticity of the nonlinear dynamical system, and a folded protein has the best ability to tame chaos. Additionally, we bring to light the connection between the ILDM stability and the thermodynamic stability, which enables us to quantify the disorderliness and relative energies of folded, misfolded and unfolded protein states. Finally, we exploit chaos for protein flexibility analysis and develop a robust chaotic algorithm for the prediction of Debye-Waller factors, or temperature factors, of protein structures

    Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes.

    Get PDF
    RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies
    • …
    corecore