2,108 research outputs found

    Kinetic distance and kinetic maps from molecular dynamics simulation

    Get PDF
    Characterizing macromolecular kinetics from molecular dynamics (MD) simulations requires a distance metric that can distinguish slowly-interconverting states. Here we build upon diffusion map theory and define a kinetic distance for irreducible Markov processes that quantifies how slowly molecular conformations interconvert. The kinetic distance can be computed given a model that approximates the eigenvalues and eigenvectors (reaction coordinates) of the MD Markov operator. Here we employ the time-lagged independent component analysis (TICA). The TICA components can be scaled to provide a kinetic map in which the Euclidean distance corresponds to the kinetic distance. As a result, the question of how many TICA dimensions should be kept in a dimensionality reduction approach becomes obsolete, and one parameter less needs to be specified in the kinetic model construction. We demonstrate the approach using TICA and Markov state model (MSM) analyses for illustrative models, protein conformation dynamics in bovine pancreatic trypsin inhibitor and protein-inhibitor association in trypsin and benzamidine

    Analysis of heat kernel highlights the strongly modular and heat-preserving structure of proteins

    Full text link
    In this paper, we study the structure and dynamical properties of protein contact networks with respect to other biological networks, together with simulated archetypal models acting as probes. We consider both classical topological descriptors, such as the modularity and statistics of the shortest paths, and different interpretations in terms of diffusion provided by the discrete heat kernel, which is elaborated from the normalized graph Laplacians. A principal component analysis shows high discrimination among the network types, either by considering the topological and heat kernel based vector characterizations. Furthermore, a canonical correlation analysis demonstrates the strong agreement among those two characterizations, providing thus an important justification in terms of interpretability for the heat kernel. Finally, and most importantly, the focused analysis of the heat kernel provides a way to yield insights on the fact that proteins have to satisfy specific structural design constraints that the other considered networks do not need to obey. Notably, the heat trace decay of an ensemble of varying-size proteins denotes subdiffusion, a peculiar property of proteins

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Deep learning of the dynamics of complex systems with its applications to biochemical molecules

    Get PDF
    Recent advancements in deep learning have revolutionized method development in several scientific fields and beyond. One central application is the extraction of equilibrium structures and long- timescale kinetics from molecular dynamics simulations, i.e. the well-known sampling problem. Previous state-of-the art methods employed a multi-step handcrafted data processing pipeline resulting in Markov state models (MSM), which can be understood as an approximation of the underlying Koopman operator. However, this approach demands choosing a set of features characterizing the molecular structure, methods and their parameters for dimension reduction to collective variables and clustering, and estimation strategies for MSMs throughout the processing pipeline. As this requires specific expertise, the approach is ultimately inaccessible to a broader community. In this thesis we apply deep learning techniques to approximate the Koopman operator in an end-to-end learning framework by employing the variational approach for Markov processes (VAMP). Thereby, the framework bypasses the multi-step process and automates the pipeline while yielding a model similar to a coarse-grained MSM. We further transfer advanced techniques from the MSM field to the deep learning framework, making it possible to (i) include experimental evidence into the model estimation, (ii) enforce reversibility, and (iii) perform coarse-graining. At this stage, post-analysis tools from MSMs can be borrowed to estimate rates of relevant rare events. Finally, we extend this approach to decompose a system into its (almost) independent subsystems and simultaneously estimate dynamical models for each of them, making it much more data efficient and enabling applications to larger proteins. Although our results solely focus on protein dynamics, the application to climate, weather, and ocean currents data is an intriguing possibility with potential to yield new insights and improve predictive power in these fields

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Quantum Monte Carlo Calculations for Carbon Nanotubes

    Get PDF
    We show how lattice Quantum Monte Carlo can be applied to the electronic properties of carbon nanotubes in the presence of strong electron-electron correlations. We employ the path-integral formalism and use methods developed within the lattice QCD community for our numerical work. Our lattice Hamiltonian is closely related to the hexagonal Hubbard model augmented by a long-range electron-electron interaction. We apply our method to the single-quasiparticle spectrum of the (3,3) armchair nanotube configuration, and consider the effects of strong electron-electron correlations. Our approach is equally applicable to other nanotubes, as well as to other carbon nanostructures. We benchmark our Monte Carlo calculations against the two- and four-site Hubbard models, where a direct numerical solution is feasible.Comment: 54 pages, 16 figures, published in Physical Review
    corecore