8 research outputs found
Autoencoders for dimensionality reduction in molecular dynamics: collective variable dimension, biasing and transition states
The heat shock protein 90 (Hsp90) is a molecular chaperone that controls the
folding and activation of client proteins using the free energy of ATP
hydrolysis. The Hsp90 active site is in its N-terminal domain (NTD). Our goal
is to characterize the dynamics of NTD using an autoencoder-learned collective
variable (CV) in conjunction with adaptive biasing force (ABF) Langevin
dynamics. Using dihedral analysis, we cluster all available experimental Hsp90
NTD structures into distinct native states. We then perform unbiased molecular
dynamics (MD) simulations to construct a dataset that represents each state and
use this dataset to train an autoencoder. Two autoencoder architectures are
considered, with one and two hidden layers respectively, and bottlenecks of
dimension ranging from 1 to 10. We demonstrate that the addition of an
extra hidden layer does not significantly improve the performance, while it
leads to complicated CVs that increases the computational cost of biased MD
calculations. In addition, a 2D bottleneck can provide enough information of
the different states, while the optimal bottleneck dimension is five. For the
2D bottleneck, the two-dimensional CV is directly used in biased MD
simulations. For the 5D bottleneck, we perform an analysis of the latent CV
space and identify the pair of CV coordinates that best separates the states of
Hsp90. Interestingly, selecting a 2D CV out of the 5D CV space leads to better
results than directly learning a 2D CV, and allows to observe transitions
between native states when running free energy biased dynamics
Méthodes d'apprentissage en simulation moléculaire
Avec lâamĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes dâapprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, lâapprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut lâapplication de mĂ©thodes dâapprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore lâamĂ©lioration de lâefficacitĂ© de lâĂ©chantillonnage des conformations molĂ©culaires dâun systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă utiliser des mĂ©thodes dâapprentissage automatique pour amĂ©liorer lâĂ©chantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour lâobservation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par lâĂ©nergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer lâobservation de tels changements en modifiant la mesure dâĂ©chantillonnage. Cependant, la plupart de ces mĂ©thodes sâappuient sur la connaissance prĂ©alable de variable collective du systĂšme, câest-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă lâaide dâalgorithmes dâapprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus dâĂȘtre utilisĂ©es pour accĂ©lĂ©rer lâĂ©chantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil dâĂ©nergie libre. Dans ce travail, dâimportantes notions et dĂ©finitions de la dynamique molĂ©culaire sont dâabord prĂ©sentĂ©es avant de passer en revue les algorithmes dâapprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons lâapplication de mĂ©thodes dâapprentissage automatique Ă un vĂ©ritable systĂšme dâintĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but dâeffectuer des simulations biaisĂ©es de ce systĂšme.With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein
Méthodes d'apprentissage en simulation moléculaire
With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec lâamĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes dâapprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, lâapprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut lâapplication de mĂ©thodes dâapprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore lâamĂ©lioration de lâefficacitĂ© de lâĂ©chantillonnage des conformations molĂ©culaires dâun systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă utiliser des mĂ©thodes dâapprentissage automatique pour amĂ©liorer lâĂ©chantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour lâobservation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par lâĂ©nergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer lâobservation de tels changements en modifiant la mesure dâĂ©chantillonnage. Cependant, la plupart de ces mĂ©thodes sâappuient sur la connaissance prĂ©alable de variable collective du systĂšme, câest-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă lâaide dâalgorithmes dâapprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus dâĂȘtre utilisĂ©es pour accĂ©lĂ©rer lâĂ©chantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil dâĂ©nergie libre. Dans ce travail, dâimportantes notions et dĂ©finitions de la dynamique molĂ©culaire sont dâabord prĂ©sentĂ©es avant de passer en revue les algorithmes dâapprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons lâapplication de mĂ©thodes dâapprentissage automatique Ă un vĂ©ritable systĂšme dâintĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but dâeffectuer des simulations biaisĂ©es de ce systĂšme
Méthodes d'apprentissage en simulation moléculaire
With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec lâamĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes dâapprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, lâapprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut lâapplication de mĂ©thodes dâapprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore lâamĂ©lioration de lâefficacitĂ© de lâĂ©chantillonnage des conformations molĂ©culaires dâun systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă utiliser des mĂ©thodes dâapprentissage automatique pour amĂ©liorer lâĂ©chantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour lâobservation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par lâĂ©nergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer lâobservation de tels changements en modifiant la mesure dâĂ©chantillonnage. Cependant, la plupart de ces mĂ©thodes sâappuient sur la connaissance prĂ©alable de variable collective du systĂšme, câest-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă lâaide dâalgorithmes dâapprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus dâĂȘtre utilisĂ©es pour accĂ©lĂ©rer lâĂ©chantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil dâĂ©nergie libre. Dans ce travail, dâimportantes notions et dĂ©finitions de la dynamique molĂ©culaire sont dâabord prĂ©sentĂ©es avant de passer en revue les algorithmes dâapprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons lâapplication de mĂ©thodes dâapprentissage automatique Ă un vĂ©ritable systĂšme dâintĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but dâeffectuer des simulations biaisĂ©es de ce systĂšme
Méthodes d'apprentissage en simulation moléculaire
With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec lâamĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes dâapprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, lâapprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut lâapplication de mĂ©thodes dâapprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore lâamĂ©lioration de lâefficacitĂ© de lâĂ©chantillonnage des conformations molĂ©culaires dâun systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă utiliser des mĂ©thodes dâapprentissage automatique pour amĂ©liorer lâĂ©chantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour lâobservation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par lâĂ©nergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer lâobservation de tels changements en modifiant la mesure dâĂ©chantillonnage. Cependant, la plupart de ces mĂ©thodes sâappuient sur la connaissance prĂ©alable de variable collective du systĂšme, câest-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă lâaide dâalgorithmes dâapprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus dâĂȘtre utilisĂ©es pour accĂ©lĂ©rer lâĂ©chantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil dâĂ©nergie libre. Dans ce travail, dâimportantes notions et dĂ©finitions de la dynamique molĂ©culaire sont dâabord prĂ©sentĂ©es avant de passer en revue les algorithmes dâapprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons lâapplication de mĂ©thodes dâapprentissage automatique Ă un vĂ©ritable systĂšme dâintĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but dâeffectuer des simulations biaisĂ©es de ce systĂšme
Chasing Collective Variables using Autoencoders and biased trajectories
International audienceIn the last decades, free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. Collective Variables (CV). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. This implies that at each iteration, a different measure is sampled, thus the new training data is distributed according to a different distribution. Given that a machine learning model is always dependent on the considered distribution, iterative methods are not guaranteed to converge to a certain CV. This can be remedied by a reweighting procedure to always fall back to learning with respect to the same unbiased Boltzmann-Gibbs measure, regardless of the biased measure used in the adaptive sampling. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes the reweighting scheme to ensure that the learning model optimizes the same loss, and achieves CV convergence. Using a small 2-dimensional toy system and the alanine dipeptide system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method
Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems
This work came out of a CECAM discussion meeting.International audienceMachine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab-initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling