Search CORE

8 research outputs found

Autoencoders for dimensionality reduction in molecular dynamics: collective variable dimension, biasing and transition states

Author: Belkacemi Zineb
Bianciotto Marc
Gkeka Paraskevi
Lelievre Tony
Minoux Herve
Stoltz Gabriel
Publication venue
Publication date: 05/06/2023
Field of study

The heat shock protein 90 (Hsp90) is a molecular chaperone that controls the folding and activation of client proteins using the free energy of ATP hydrolysis. The Hsp90 active site is in its N-terminal domain (NTD). Our goal is to characterize the dynamics of NTD using an autoencoder-learned collective variable (CV) in conjunction with adaptive biasing force (ABF) Langevin dynamics. Using dihedral analysis, we cluster all available experimental Hsp90 NTD structures into distinct native states. We then perform unbiased molecular dynamics (MD) simulations to construct a dataset that represents each state and use this dataset to train an autoencoder. Two autoencoder architectures are considered, with one and two hidden layers respectively, and bottlenecks of dimension

k

ranging from 1 to 10. We demonstrate that the addition of an extra hidden layer does not significantly improve the performance, while it leads to complicated CVs that increases the computational cost of biased MD calculations. In addition, a 2D bottleneck can provide enough information of the different states, while the optimal bottleneck dimension is five. For the 2D bottleneck, the two-dimensional CV is directly used in biased MD simulations. For the 5D bottleneck, we perform an analysis of the latent CV space and identify the pair of CV coordinates that best separates the states of Hsp90. Interestingly, selecting a 2D CV out of the 5D CV space leads to better results than directly learning a 2D CV, and allows to observe transitions between native states when running free energy biased dynamics

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

Méthodes d'apprentissage en simulation moléculaire

Author: Belkacemi Zineb
Publication venue
Publication date: 06/07/2022
Field of study

Avec l’amélioration continue de la capacité de calcul des ordinateurs, les méthodes d’apprentissage automatique ont permis le développement de nouvelles solutions aux problèmes dans divers domaines. En particulier, l’apprentissage automatique a été largement utilisé au cours de la dernière décennie dans le domaine de la biochimie computationnelle et de la découverte et développement de nouveaux médicaments. Cela inclut l’application de méthodes d’apprentissage automatique pour la définition de nouvelles molécules, la détermination de sites importants dans les protéines ciblées, la conception de champs de force adéquats fondés sur des résultats expérimentaux ou encore l’amélioration de l’efficacité de l’échantillonnage des conformations moléculaires d’un système donné. Cette thèse de doctorat se concentre sur la dernière tâche consistant à utiliser des méthodes d’apprentissage automatique pour améliorer l’échantillonnage en dynamique moléculaire. En effet, les simulations de dynamique moléculaire se sont avérées être un outil très utile en complément des expériences en laboratoire. Malgré leur large utilisation pour capturer les phénomènes rapides, il existe encore de nombreux cas où les échelles de temps accessibles aux simulations de dynamique moléculaire sont bien plus petites que les échelles de temps nécessaires pour l’observation des changements conformationnels importants du système, en raison de la présence de barrières hautes dans le profil énergétique. Les méthodes de biaisage par l’énergie libre se sont avérées être des outils puissants pour accélérer l’observation de tels changements en modifiant la mesure d’échantillonnage. Cependant, la plupart de ces méthodes s’appuient sur la connaissance préalable de variable collective du système, c’est-à-dire des degrés de liberté de faible dimension représentant les directions lentes du système moléculaire. Ces variables collectives peuvent être identifiées à l’aide d’algorithmes d’apprentissage automatique et de réduction de dimensionalité. En plus d’être utilisées pour accélérer l’échantillonnage, les variables collectives construites par apprentissage automatique aident également à acquérir une connaissance précieuse du système étudié, à savoir en facilitant la visualisation de ses différents états, ainsi que de son profil d’énergie libre. Dans ce travail, d’importantes notions et définitions de la dynamique moléculaire sont d’abord présentées avant de passer en revue les algorithmes d’apprentissage automatique de pointe qui ont été conçus ou appliqués ces dernières années pour la construction automatique de variables collectives. Ensuite, la méthode développée au cours de cette thèse, baptisée "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette méthode utilise un schéma itératif pour générer alternativement de nouvelles simulations et apprendre les variables collectives à partir de ces simulations en utilisant des autoencodeurs. Enfin, nous présentons l’application de méthodes d’apprentissage automatique à un véritable système d’intérêt. Ici, des autoencodeurs sont utilisés pour apprendre les variables collectives de la protéine chaperone HSP90, dans le but d’effectuer des simulations biaisées de ce système.With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein

Theses.fr

Méthodes d'apprentissage en simulation moléculaire

Author: Belkacemi Zineb
Publication venue: HAL CCSD
Publication date: 06/07/2022
Field of study

With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec l’amélioration continue de la capacité de calcul des ordinateurs, les méthodes d’apprentissage automatique ont permis le développement de nouvelles solutions aux problèmes dans divers domaines. En particulier, l’apprentissage automatique a été largement utilisé au cours de la dernière décennie dans le domaine de la biochimie computationnelle et de la découverte et développement de nouveaux médicaments. Cela inclut l’application de méthodes d’apprentissage automatique pour la définition de nouvelles molécules, la détermination de sites importants dans les protéines ciblées, la conception de champs de force adéquats fondés sur des résultats expérimentaux ou encore l’amélioration de l’efficacité de l’échantillonnage des conformations moléculaires d’un système donné. Cette thèse de doctorat se concentre sur la dernière tâche consistant à utiliser des méthodes d’apprentissage automatique pour améliorer l’échantillonnage en dynamique moléculaire. En effet, les simulations de dynamique moléculaire se sont avérées être un outil très utile en complément des expériences en laboratoire. Malgré leur large utilisation pour capturer les phénomènes rapides, il existe encore de nombreux cas où les échelles de temps accessibles aux simulations de dynamique moléculaire sont bien plus petites que les échelles de temps nécessaires pour l’observation des changements conformationnels importants du système, en raison de la présence de barrières hautes dans le profil énergétique. Les méthodes de biaisage par l’énergie libre se sont avérées être des outils puissants pour accélérer l’observation de tels changements en modifiant la mesure d’échantillonnage. Cependant, la plupart de ces méthodes s’appuient sur la connaissance préalable de variable collective du système, c’est-à-dire des degrés de liberté de faible dimension représentant les directions lentes du système moléculaire. Ces variables collectives peuvent être identifiées à l’aide d’algorithmes d’apprentissage automatique et de réduction de dimensionalité. En plus d’être utilisées pour accélérer l’échantillonnage, les variables collectives construites par apprentissage automatique aident également à acquérir une connaissance précieuse du système étudié, à savoir en facilitant la visualisation de ses différents états, ainsi que de son profil d’énergie libre. Dans ce travail, d’importantes notions et définitions de la dynamique moléculaire sont d’abord présentées avant de passer en revue les algorithmes d’apprentissage automatique de pointe qui ont été conçus ou appliqués ces dernières années pour la construction automatique de variables collectives. Ensuite, la méthode développée au cours de cette thèse, baptisée "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette méthode utilise un schéma itératif pour générer alternativement de nouvelles simulations et apprendre les variables collectives à partir de ces simulations en utilisant des autoencodeurs. Enfin, nous présentons l’application de méthodes d’apprentissage automatique à un véritable système d’intérêt. Ici, des autoencodeurs sont utilisés pour apprendre les variables collectives de la protéine chaperone HSP90, dans le but d’effectuer des simulations biaisées de ce système

Thèses en Ligne

Méthodes d'apprentissage en simulation moléculaire

Author: Belkacemi Zineb
Publication venue: HAL CCSD
Publication date: 06/07/2022
Field of study

thèses en ligne de ParisTech

Méthodes d'apprentissage en simulation moléculaire

Author: Belkacemi Zineb
Publication venue: HAL CCSD
Publication date: 06/07/2022
Field of study

Thèses en Ligne

thèses en ligne de ParisTech

Theses.fr

HAL-Ecole des Ponts ParisTech

Chasing Collective Variables using Autoencoders and biased trajectories

Author: Belkacemi Zineb
Gkeka Paraskevi
Lelièvre Tony
Stoltz Gabriel
Publication venue: 'American Chemical Society (ACS)'
Publication date: 26/04/2021
Field of study

International audienceIn the last decades, free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. Collective Variables (CV). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. This implies that at each iteration, a different measure is sampled, thus the new training data is distributed according to a different distribution. Given that a machine learning model is always dependent on the considered distribution, iterative methods are not guaranteed to converge to a certain CV. This can be remedied by a reweighting procedure to always fall back to learning with respect to the same unbiased Boltzmann-Gibbs measure, regardless of the biased measure used in the adaptive sampling. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes the reweighting scheme to ensure that the learning model optimizes the same loss, and achieves CV convergence. Using a small 2-dimensional toy system and the alanine dipeptide system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems

Author: Aaron R. Dinner
Alexandre Tkatchenko
Amir Barati Farimani
Ana Silveira
Andrew L. Ferguson
Borg I.
Chatterjee A.
Christine Peter
Collet P.
Doerr S.
Fabio Pietrucci
Gabriel Stoltz
Hervé Minoux
Jasinski A.
Jean-Bernard Maillet
John D. Chodera
Jolliffe I.
Jung H.
Loève M.
Michele Ceriotti
N. Feinberg E.
Nadler B.
Paraskevi Gkeka
Pérez-Villa A.
Rafal Wiewiora
Tony Lelièvre
Zineb Belkacemi
Zofia Trstanova
Zwanzig R.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 15/04/2020
Field of study

This work came out of a CECAM discussion meeting.International audienceMachine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab-initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

PubMed Central

HAL-CEA

HAL-Ecole des Ponts ParisTech

Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems

Author: Aaron R. Dinner
Alexandre Tkatchenko
Amir Barati Farimani
Ana Silveira
Andrew L. Ferguson
Borg I.
Chatterjee A.
Christine Peter
Collet P.
Doerr S.
Fabio Pietrucci
Gabriel Stoltz
Hervé Minoux
Jasinski A.
Jean-Bernard Maillet
John D. Chodera
Jolliffe I.
Jung H.
Loève M.
Michele Ceriotti
N. Feinberg E.
Nadler B.
Paraskevi Gkeka
Pérez-Villa A.
Rafal Wiewiora
Tony Lelièvre
Zineb Belkacemi
Zofia Trstanova
Zwanzig R.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref