Search CORE

17 research outputs found

Leveraging Multitask Learning to Improve the Transferability of Machine Learned Force Fields

Author: Farhad Ramezanghorbani
James Stevenson
Karl Leswing
Leif Jacobson
Steven Dajnowicz
Publication venue
Publication date: 27/09/2023
Field of study

Transferable neural network potentials have shown great promise as an avenue to increase the accuracy and applicability of existing atomistic force fields for organic molecules and inorganic materials. Training sets used to develop transferable potentials are very large, typically millions of examples, and as such, are restricted to relatively inexpensive levels of ab initio theory, such as density functional theory in a double- or triple-zeta quality basis set, which are subject to significant errors. It has been previously demonstrated using transfer learning that a model trained on a large dataset of such inexpensive calculations can be re-trained to reproduce energies of a higher level of theory using a much smaller dataset. Here, we show that more generally, one can use hard parameter sharing to successfully train to multiple levels of theory simultaneously. We demonstrate that simultaneously training to two levels of theory is an alternative to freezing layers in a neural network and re-training. Further, we show that training multiple levels of theory can improve the overall performance of all predictions and that one can transfer knowledge about a chemical domain present in only one of the datasets to all predicted levels of theory. This methodology is one way in which multiple, incompatible datasets can be combined to train a transferable model, increasing the accuracy and domain of applicability of machine learning force fields

ChemRxiv

Efficient Exploration of Chemical Space with Docking and Deep-Learning

Author: Brian Shoichet
Karl Leswing
Kun Yao
Matthew P. Repasky
Robert Abel
Steven Jerome
Ying Yang
Publication venue
Publication date: 04/03/2021
Field of study

With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today’s screening libraries are larger and more diverse, enabling discovery of more potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in-silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) Identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures nearly all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blinded test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds

ChemRxiv

Transferable Neural Network Potential Energy Surfaces for Closed-Shell Organic Molecules: Extension to Ions

Author: Delaram Ghoreishi
Ed Harder
Farhad Ramezanghorbani
James Stevenson
Karl Leswing
Leif Jacobson
Robert Abel
Publication venue
Publication date: 21/02/2022
Field of study

Transferable high dimensional neural network potentials (HDNNP) have shown great promise as an avenue to increase the accuracy and domain of applicability of existing atomistic force fields for organic systems relevant to life science. We have previously reported such a potential (Schrödinger-ANI) that has broad coverage of druglike molecules. We extend that work here to cover ionic and zwitterionic druglike molecules expected to be relevant to drug discovery research activities. We report a novel HDNNP architechture, which we call QRNN, that predicts atomic charges and uses these charges as descriptors in an energy model which delivers conformational energies within chemical accuracy when measured against the reference theory it is trained to. Further, we find that delta learning based on a semi-empirical level of theory approximately halves the errors. We test the models on torsion energy profiles, relative conformational energies, geometric parameters and relative tautomer errors

ChemRxiv

FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols using Active Learning

Author: Cesar de Oliveira
Karl Leswing
Rene Kanters
Robert Abel
Sathesh Bhat
Shulu Feng
Publication venue
Publication date: 03/05/2023
Field of study

Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of ~1 kcal mol-1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the a) relatively large parameter space to be explored, b) significant compute requirements, and c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses active learning to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active learning process, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology

ChemRxiv

Development of Scalable and Generalizable Machine Learned Force Field for Polymers

Author: Andrea Browning
James Stevenson
Karl Leswing
Leif Jacobson
Mathew Halls
Mohammad Atif Faiz Afzal
Shaswat Mohanty
Publication venue
Publication date: 07/09/2023
Field of study

Understanding and predicting the properties of polymers is vital to developing tailored polymer molecules for desired applications. Classical force fields may fail to capture key properties, for example, the transport properties of certain polymer systems such as polyethylene glycol. As a solution, we present an alternative potential energy surface, a charge recursive neural network (QRNN) model trained on DFT calculations made on smaller atomic clusters that generalizes well to oligomers comprising larger atomic clusters or longer chains. We demonstrate the validity of the polymer QRNN workflow by modeling the oligomers of ethylene glycol. We apply two rounds of active learning (addition of new training clusters based on current model performance) and implement a novel model training approach that uses partial charges from a semi-empirical method. Our developed QRNN model for polymers produces stable molecular dynamics (MD) simulation trajectory and captures the dynamics of polymer chains as indicated by the striking agreement with experimental values. Our model allows working on much larger systems than allowed by DFT simulations, at the same time providing a more accurate force field than classical force fields which provides a promising avenue for large-scale molecular simulations of polymeric systems

ChemRxiv

FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols Using Active Learning

Author: César de Oliveira (6157808)
Karl Leswing (7299341)
René Kanters (4725825)
Robert Abel (1441678)
Sathesh Bhat (1436737)
Shulu Feng (2743960)
Publication venue
Publication date: 18/08/2023
Field of study

Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small-molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of ∼1 kcal mol–1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the (a) relatively large parameter space to be explored, (b) significant compute requirements, and (c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses an active-learning workflow to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active-learning workflow, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology

The Francis Crick Institute

Schrodinger-ANI: An Eight-Element Neural Network Interaction Potential with Greatly Expanded Coverage of Druglike Chemical Space

Author: Chuanjie Wu
Edward Harder
James Stevenson
Jon Maple
Karl Leswing
Leif D. Jacobson
Robert Abel
Yutong Zhao
Publication venue
Publication date: 12/12/2019
Field of study

We have developed a neural network potential energy function for use in drug discovery, with chemical element support extended from 41% to 94% of druglike molecules based on ChEMBL. We expand on the work of Smith et al., with their highly accurate network for the elements H, C, N, O, creating a network for H, C, N, O, S, F, Cl, P. We focus particularly on the calculation of relative conformer energies, for which we show that our new potential energy function has an RMSE of 0.70 kcal/mol for prospective druglike molecule conformers, substantially better than the previous state of the art. The speed and accuracy of this model could greatly accelerate the parameterization of protein-ligand binding free energy calculations for novel druglike molecules

ChemRxiv

Combining Cloud-Based Free Energy Calculations, Synthetically Aware Enumerations and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization

Author: Gabriel Marques
Joshua Staker
Karl Leswing
Kyle Konze
Kyle Marshall
Phani Ghanakota
Pieter Bos
Robert Abel
SATHESH BHAT
Publication venue
Publication date: 10/02/2020
Field of study

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying 50 50 <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br /

ChemRxiv