8 research outputs found

    Autoencoders for dimensionality reduction in molecular dynamics: collective variable dimension, biasing and transition states

    Full text link
    The heat shock protein 90 (Hsp90) is a molecular chaperone that controls the folding and activation of client proteins using the free energy of ATP hydrolysis. The Hsp90 active site is in its N-terminal domain (NTD). Our goal is to characterize the dynamics of NTD using an autoencoder-learned collective variable (CV) in conjunction with adaptive biasing force (ABF) Langevin dynamics. Using dihedral analysis, we cluster all available experimental Hsp90 NTD structures into distinct native states. We then perform unbiased molecular dynamics (MD) simulations to construct a dataset that represents each state and use this dataset to train an autoencoder. Two autoencoder architectures are considered, with one and two hidden layers respectively, and bottlenecks of dimension kk ranging from 1 to 10. We demonstrate that the addition of an extra hidden layer does not significantly improve the performance, while it leads to complicated CVs that increases the computational cost of biased MD calculations. In addition, a 2D bottleneck can provide enough information of the different states, while the optimal bottleneck dimension is five. For the 2D bottleneck, the two-dimensional CV is directly used in biased MD simulations. For the 5D bottleneck, we perform an analysis of the latent CV space and identify the pair of CV coordinates that best separates the states of Hsp90. Interestingly, selecting a 2D CV out of the 5D CV space leads to better results than directly learning a 2D CV, and allows to observe transitions between native states when running free energy biased dynamics

    Méthodes d'apprentissage en simulation moléculaire

    No full text
    Avec l’amĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes d’apprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, l’apprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut l’application de mĂ©thodes d’apprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore l’amĂ©lioration de l’efficacitĂ© de l’échantillonnage des conformations molĂ©culaires d’un systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă  utiliser des mĂ©thodes d’apprentissage automatique pour amĂ©liorer l’échantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour l’observation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par l’énergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer l’observation de tels changements en modifiant la mesure d’échantillonnage. Cependant, la plupart de ces mĂ©thodes s’appuient sur la connaissance prĂ©alable de variable collective du systĂšme, c’est-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă  l’aide d’algorithmes d’apprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus d’ĂȘtre utilisĂ©es pour accĂ©lĂ©rer l’échantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă  acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă  savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil d’énergie libre. Dans ce travail, d’importantes notions et dĂ©finitions de la dynamique molĂ©culaire sont d’abord prĂ©sentĂ©es avant de passer en revue les algorithmes d’apprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă  partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons l’application de mĂ©thodes d’apprentissage automatique Ă  un vĂ©ritable systĂšme d’intĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but d’effectuer des simulations biaisĂ©es de ce systĂšme.With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein

    Méthodes d'apprentissage en simulation moléculaire

    No full text
    With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec l’amĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes d’apprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, l’apprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut l’application de mĂ©thodes d’apprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore l’amĂ©lioration de l’efficacitĂ© de l’échantillonnage des conformations molĂ©culaires d’un systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă  utiliser des mĂ©thodes d’apprentissage automatique pour amĂ©liorer l’échantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour l’observation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par l’énergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer l’observation de tels changements en modifiant la mesure d’échantillonnage. Cependant, la plupart de ces mĂ©thodes s’appuient sur la connaissance prĂ©alable de variable collective du systĂšme, c’est-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă  l’aide d’algorithmes d’apprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus d’ĂȘtre utilisĂ©es pour accĂ©lĂ©rer l’échantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă  acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă  savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil d’énergie libre. Dans ce travail, d’importantes notions et dĂ©finitions de la dynamique molĂ©culaire sont d’abord prĂ©sentĂ©es avant de passer en revue les algorithmes d’apprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă  partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons l’application de mĂ©thodes d’apprentissage automatique Ă  un vĂ©ritable systĂšme d’intĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but d’effectuer des simulations biaisĂ©es de ce systĂšme

    Méthodes d'apprentissage en simulation moléculaire

    No full text
    With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec l’amĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes d’apprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, l’apprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut l’application de mĂ©thodes d’apprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore l’amĂ©lioration de l’efficacitĂ© de l’échantillonnage des conformations molĂ©culaires d’un systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă  utiliser des mĂ©thodes d’apprentissage automatique pour amĂ©liorer l’échantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour l’observation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par l’énergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer l’observation de tels changements en modifiant la mesure d’échantillonnage. Cependant, la plupart de ces mĂ©thodes s’appuient sur la connaissance prĂ©alable de variable collective du systĂšme, c’est-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă  l’aide d’algorithmes d’apprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus d’ĂȘtre utilisĂ©es pour accĂ©lĂ©rer l’échantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă  acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă  savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil d’énergie libre. Dans ce travail, d’importantes notions et dĂ©finitions de la dynamique molĂ©culaire sont d’abord prĂ©sentĂ©es avant de passer en revue les algorithmes d’apprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă  partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons l’application de mĂ©thodes d’apprentissage automatique Ă  un vĂ©ritable systĂšme d’intĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but d’effectuer des simulations biaisĂ©es de ce systĂšme

    Méthodes d'apprentissage en simulation moléculaire

    No full text
    With the continually improving computational capacity of computers, machine learning methods have provided novel solutions to problems in a variety of fields. In particular, machine learning has been extensively used in the last decade in the field of computational biochemistry and drug discovery in virtually all stages, such as defining new molecules, determining important sites in targeted proteins, designing adequate forcefields based on experimental results, or improving the efficiency of sampling molecular conformations of a given system. This thesis focuses on the latter task of using machine learning methods for enhanced sampling in molecular dynamics. Molecular Dynamics (MD) simulations have proven to be a very useful complementary tool to experiments. Despite their wide use to capture fast occurring phenomena, there are still many cases where the time scales accessible to MD simulations are far smaller than the time scales needed for the observation of important conformational changes of the system, due to the presence of high energy barriers. Free energy biasing methods have proven to be powerful tools to to accelerate the observation of such changes by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. collective variables. Alternatively, such low dimensional mappings can be identified using machine learning and dimensionality reduction algorithms. In addition to being used to accelerate sampling, the learned collective variables can also help acquire valuable insight into the studied system, namely by facilitating the visualization of the different states of the system, as well as its free energy landscape. In this work, important notions and definitions of molecular dynamics are first presented before reviewing state of the art machine learning algorithms which were devised or applied in the recent years for automatic collective variable discovery and enhanced sampling. Then, the method developed during this thesis, coined "free energy biasing and machine learning with autoencoders" (FEBILAE), is introduced. This method uses an iterative scheme to alternately generate new simulations and learn collective variables from these simulations using autoencoders. Finally, we present the application of machine learning methods to a real system of interest. Here, autoencoders are used to learn collective variables to perform biased simulations of the heat shock 90 (HSP90) chaperone protein.Avec l’amĂ©lioration continue de la capacitĂ© de calcul des ordinateurs, les mĂ©thodes d’apprentissage automatique ont permis le dĂ©veloppement de nouvelles solutions aux problĂšmes dans divers domaines. En particulier, l’apprentissage automatique a Ă©tĂ© largement utilisĂ© au cours de la derniĂšre dĂ©cennie dans le domaine de la biochimie computationnelle et de la dĂ©couverte et dĂ©veloppement de nouveaux mĂ©dicaments. Cela inclut l’application de mĂ©thodes d’apprentissage automatique pour la dĂ©finition de nouvelles molĂ©cules, la dĂ©termination de sites importants dans les protĂ©ines ciblĂ©es, la conception de champs de force adĂ©quats fondĂ©s sur des rĂ©sultats expĂ©rimentaux ou encore l’amĂ©lioration de l’efficacitĂ© de l’échantillonnage des conformations molĂ©culaires d’un systĂšme donnĂ©. Cette thĂšse de doctorat se concentre sur la derniĂšre tĂąche consistant Ă  utiliser des mĂ©thodes d’apprentissage automatique pour amĂ©liorer l’échantillonnage en dynamique molĂ©culaire. En effet, les simulations de dynamique molĂ©culaire se sont avĂ©rĂ©es ĂȘtre un outil trĂšs utile en complĂ©ment des expĂ©riences en laboratoire. MalgrĂ© leur large utilisation pour capturer les phĂ©nomĂšnes rapides, il existe encore de nombreux cas oĂč les Ă©chelles de temps accessibles aux simulations de dynamique molĂ©culaire sont bien plus petites que les Ă©chelles de temps nĂ©cessaires pour l’observation des changements conformationnels importants du systĂšme, en raison de la prĂ©sence de barriĂšres hautes dans le profil Ă©nergĂ©tique. Les mĂ©thodes de biaisage par l’énergie libre se sont avĂ©rĂ©es ĂȘtre des outils puissants pour accĂ©lĂ©rer l’observation de tels changements en modifiant la mesure d’échantillonnage. Cependant, la plupart de ces mĂ©thodes s’appuient sur la connaissance prĂ©alable de variable collective du systĂšme, c’est-Ă -dire des degrĂ©s de libertĂ© de faible dimension reprĂ©sentant les directions lentes du systĂšme molĂ©culaire. Ces variables collectives peuvent ĂȘtre identifiĂ©es Ă  l’aide d’algorithmes d’apprentissage automatique et de rĂ©duction de dimensionalitĂ©. En plus d’ĂȘtre utilisĂ©es pour accĂ©lĂ©rer l’échantillonnage, les variables collectives construites par apprentissage automatique aident Ă©galement Ă  acquĂ©rir une connaissance prĂ©cieuse du systĂšme Ă©tudiĂ©, Ă  savoir en facilitant la visualisation de ses diffĂ©rents Ă©tats, ainsi que de son profil d’énergie libre. Dans ce travail, d’importantes notions et dĂ©finitions de la dynamique molĂ©culaire sont d’abord prĂ©sentĂ©es avant de passer en revue les algorithmes d’apprentissage automatique de pointe qui ont Ă©tĂ© conçus ou appliquĂ©s ces derniĂšres annĂ©es pour la construction automatique de variables collectives. Ensuite, la mĂ©thode dĂ©veloppĂ©e au cours de cette thĂšse, baptisĂ©e "Free energy biasing and machine learning with autoencoders" (FEBILAE), est introduite. Cette mĂ©thode utilise un schĂ©ma itĂ©ratif pour gĂ©nĂ©rer alternativement de nouvelles simulations et apprendre les variables collectives Ă  partir de ces simulations en utilisant des autoencodeurs. Enfin, nous prĂ©sentons l’application de mĂ©thodes d’apprentissage automatique Ă  un vĂ©ritable systĂšme d’intĂ©rĂȘt. Ici, des autoencodeurs sont utilisĂ©s pour apprendre les variables collectives de la protĂ©ine chaperone HSP90, dans le but d’effectuer des simulations biaisĂ©es de ce systĂšme

    Chasing Collective Variables using Autoencoders and biased trajectories

    No full text
    International audienceIn the last decades, free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e. Collective Variables (CV). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. This implies that at each iteration, a different measure is sampled, thus the new training data is distributed according to a different distribution. Given that a machine learning model is always dependent on the considered distribution, iterative methods are not guaranteed to converge to a certain CV. This can be remedied by a reweighting procedure to always fall back to learning with respect to the same unbiased Boltzmann-Gibbs measure, regardless of the biased measure used in the adaptive sampling. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes the reweighting scheme to ensure that the learning model optimizes the same loss, and achieves CV convergence. Using a small 2-dimensional toy system and the alanine dipeptide system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method

    Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems

    No full text
    This work came out of a CECAM discussion meeting.International audienceMachine learning encompasses a set of tools and algorithms which are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab-initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling
    corecore