2,108 research outputs found
Kinetic distance and kinetic maps from molecular dynamics simulation
Characterizing macromolecular kinetics from molecular dynamics (MD)
simulations requires a distance metric that can distinguish
slowly-interconverting states. Here we build upon diffusion map theory and
define a kinetic distance for irreducible Markov processes that quantifies how
slowly molecular conformations interconvert. The kinetic distance can be
computed given a model that approximates the eigenvalues and eigenvectors
(reaction coordinates) of the MD Markov operator. Here we employ the
time-lagged independent component analysis (TICA). The TICA components can be
scaled to provide a kinetic map in which the Euclidean distance corresponds to
the kinetic distance. As a result, the question of how many TICA dimensions
should be kept in a dimensionality reduction approach becomes obsolete, and one
parameter less needs to be specified in the kinetic model construction. We
demonstrate the approach using TICA and Markov state model (MSM) analyses for
illustrative models, protein conformation dynamics in bovine pancreatic trypsin
inhibitor and protein-inhibitor association in trypsin and benzamidine
Analysis of heat kernel highlights the strongly modular and heat-preserving structure of proteins
In this paper, we study the structure and dynamical properties of protein
contact networks with respect to other biological networks, together with
simulated archetypal models acting as probes. We consider both classical
topological descriptors, such as the modularity and statistics of the shortest
paths, and different interpretations in terms of diffusion provided by the
discrete heat kernel, which is elaborated from the normalized graph Laplacians.
A principal component analysis shows high discrimination among the network
types, either by considering the topological and heat kernel based vector
characterizations. Furthermore, a canonical correlation analysis demonstrates
the strong agreement among those two characterizations, providing thus an
important justification in terms of interpretability for the heat kernel.
Finally, and most importantly, the focused analysis of the heat kernel provides
a way to yield insights on the fact that proteins have to satisfy specific
structural design constraints that the other considered networks do not need to
obey. Notably, the heat trace decay of an ensemble of varying-size proteins
denotes subdiffusion, a peculiar property of proteins
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Deep learning of the dynamics of complex systems with its applications to biochemical molecules
Recent advancements in deep learning have revolutionized method development in several scientific fields and beyond. One central application is the extraction of equilibrium structures and long- timescale kinetics from molecular dynamics simulations, i.e. the well-known sampling problem. Previous state-of-the art methods employed a multi-step handcrafted data processing pipeline resulting in Markov state models (MSM), which can be understood as an approximation of the underlying Koopman operator. However, this approach demands choosing a set of features characterizing the molecular structure, methods and their parameters for dimension reduction to collective variables and clustering, and estimation strategies for MSMs throughout the processing pipeline. As this requires specific expertise, the approach is ultimately inaccessible to a broader community.
In this thesis we apply deep learning techniques to approximate the Koopman operator in an end-to-end learning framework by employing the variational approach for Markov processes (VAMP). Thereby, the framework bypasses the multi-step process and automates the pipeline while yielding a model similar to a coarse-grained MSM. We further transfer advanced techniques from the MSM field to the deep learning framework, making it possible to (i) include experimental evidence into the model estimation, (ii) enforce reversibility, and (iii) perform coarse-graining. At this stage, post-analysis tools from MSMs can be borrowed to estimate rates of relevant rare events. Finally, we extend this approach to decompose a system into its (almost) independent subsystems and simultaneously estimate dynamical models for each of them, making it much more data efficient and enabling applications to larger proteins.
Although our results solely focus on protein dynamics, the application to climate, weather, and ocean currents data is an intriguing possibility with potential to yield new insights and improve predictive power in these fields
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Quantum Monte Carlo Calculations for Carbon Nanotubes
We show how lattice Quantum Monte Carlo can be applied to the electronic
properties of carbon nanotubes in the presence of strong electron-electron
correlations. We employ the path-integral formalism and use methods developed
within the lattice QCD community for our numerical work. Our lattice
Hamiltonian is closely related to the hexagonal Hubbard model augmented by a
long-range electron-electron interaction. We apply our method to the
single-quasiparticle spectrum of the (3,3) armchair nanotube configuration, and
consider the effects of strong electron-electron correlations. Our approach is
equally applicable to other nanotubes, as well as to other carbon
nanostructures. We benchmark our Monte Carlo calculations against the two- and
four-site Hubbard models, where a direct numerical solution is feasible.Comment: 54 pages, 16 figures, published in Physical Review
- …