17 research outputs found
Leveraging Multitask Learning to Improve the Transferability of Machine Learned Force Fields
Transferable neural network potentials have shown great promise as an avenue to increase the accuracy and applicability of existing atomistic force fields for organic molecules and inorganic materials. Training sets used to develop transferable potentials are very large, typically millions of examples, and as such, are restricted to relatively inexpensive levels of ab initio theory, such as density functional theory in a double- or triple-zeta quality basis set, which are subject to significant errors. It has been previously demonstrated using transfer learning that a model trained on a large dataset of such inexpensive calculations can be re-trained to reproduce energies of a higher level of theory using a much smaller dataset. Here, we show that more generally, one can use hard parameter sharing to successfully train to multiple levels of theory simultaneously. We demonstrate that simultaneously training to two levels of theory is an alternative to freezing layers in a neural network and re-training. Further, we show that training multiple levels of theory can improve the overall performance of all predictions and that one can transfer knowledge about a chemical domain present in only one of the datasets to all predicted levels of theory. This methodology is one way in which multiple, incompatible datasets can be combined to train a transferable model, increasing the accuracy and domain of applicability of machine learning force fields
Efficient Exploration of Chemical Space with Docking and Deep-Learning
With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today’s screening libraries are larger and more diverse, enabling discovery of more potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in-silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) Identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures nearly all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blinded test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds
Transferable Neural Network Potential Energy Surfaces for Closed-Shell Organic Molecules: Extension to Ions
Transferable high dimensional neural network potentials (HDNNP) have shown great promise as an avenue to increase the accuracy and domain of applicability of existing atomistic force fields for organic systems relevant to life science. We have previously reported such a potential (Schrödinger-ANI) that has broad coverage of druglike molecules. We extend that work here to cover ionic and zwitterionic druglike molecules expected to be relevant to drug discovery research activities. We report a novel HDNNP architechture, which we call QRNN, that predicts atomic charges and uses these charges as descriptors in an energy model which delivers conformational energies within chemical accuracy when measured against the reference theory it is trained to. Further, we find that delta learning based on a semi-empirical level of theory approximately halves the errors. We test the models on torsion energy profiles, relative conformational energies, geometric parameters and relative tautomer errors
FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols using Active Learning
Significant improvements have been made in the past decade to methods that rapidly and accurately predict binding affinity through free energy perturbation (FEP) calculations. This has been driven by recent advances in small molecule force fields and sampling algorithms combined with the availability of low-cost parallel computing. Predictive accuracies of ~1 kcal mol-1 have been regularly achieved, which are sufficient to drive potency optimization in modern drug discovery campaigns. Despite the robustness of these FEP approaches across multiple target classes, there are invariably target systems that do not display expected performance with default FEP settings. Traditionally, these systems required labor-intensive manual protocol development to arrive at parameter settings that produce a predictive FEP model. Due to the a) relatively large parameter space to be explored, b) significant compute requirements, and c) limited understanding of how combinations of parameters can affect FEP performance, manual FEP protocol optimization can take weeks to months to complete, and often does not involve rigorous train-test set splits, resulting in potential overfitting. These manual FEP protocol development timelines do not coincide with tight drug discovery project timelines, essentially preventing the use of FEP calculations for these target systems. Here, we describe an automated workflow termed FEP Protocol Builder (FEP-PB) to rapidly generate accurate FEP protocols for systems that do not perform well with default settings. FEP-PB uses active learning to iteratively search the protocol parameter space to develop accurate FEP protocols. To validate this approach, we applied it to pharmaceutically relevant systems where default FEP settings could not produce predictive models. We demonstrate that FEP-PB can rapidly generate accurate FEP protocols for the previously challenging MCL1 system with limited human intervention. We also apply FEP-PB in a real-world drug discovery setting to generate an accurate FEP protocol for the p97 system. FEP-PB is able to generate a more accurate protocol than the expert user, rapidly validating p97 as amenable to free energy calculations. Additionally, through the active learning process, we are able to gain insight into which parameters are most important for a given system. These results suggest that FEP-PB is a robust tool that can aid in rapidly developing accurate FEP protocols and increasing the number of targets that are amenable to the technology
Development of Scalable and Generalizable Machine Learned Force Field for Polymers
Understanding and predicting the properties of polymers is vital to developing tailored polymer molecules for desired applications. Classical force fields may fail to capture key properties, for example, the transport properties of certain polymer systems such as polyethylene glycol. As a solution, we present an alternative potential energy surface, a charge recursive neural network (QRNN) model trained on DFT calculations made on smaller atomic clusters that generalizes well to oligomers comprising larger atomic clusters or longer chains. We demonstrate the validity of the polymer QRNN workflow by modeling the oligomers of ethylene glycol. We apply two rounds of active learning (addition of new training clusters based on current model performance) and implement a novel model training approach that uses partial charges from a semi-empirical method. Our developed QRNN model for polymers produces stable molecular dynamics (MD) simulation trajectory and captures the dynamics of polymer chains as indicated by the striking agreement with experimental values. Our model allows working on much larger systems than allowed by DFT simulations, at the same time providing a more accurate force field than classical force fields which provides a promising avenue for large-scale molecular simulations of polymeric systems
FEP Protocol Builder: Optimization of Free Energy Perturbation Protocols Using Active Learning
Significant improvements have been made in the past decade
to methods
that rapidly and accurately predict binding affinity through free
energy perturbation (FEP) calculations. This has been driven by recent
advances in small-molecule force fields and sampling algorithms combined
with the availability of low-cost parallel computing. Predictive accuracies
of ∼1 kcal mol–1 have been regularly achieved,
which are sufficient to drive potency optimization in modern drug
discovery campaigns. Despite the robustness of these FEP approaches
across multiple target classes, there are invariably target systems
that do not display expected performance with default FEP settings.
Traditionally, these systems required labor-intensive manual protocol
development to arrive at parameter settings that produce a predictive
FEP model. Due to the (a) relatively large parameter space to be explored,
(b) significant compute requirements, and (c) limited understanding
of how combinations of parameters can affect FEP performance, manual
FEP protocol optimization can take weeks to months to complete, and
often does not involve rigorous train-test set splits, resulting in
potential overfitting. These manual FEP protocol development timelines
do not coincide with tight drug discovery project timelines, essentially
preventing the use of FEP calculations for these target systems. Here,
we describe an automated workflow termed FEP Protocol Builder (FEP-PB)
to rapidly generate accurate FEP protocols for systems that do not
perform well with default settings. FEP-PB uses an active-learning
workflow to iteratively search the protocol parameter space to develop
accurate FEP protocols. To validate this approach, we applied it to
pharmaceutically relevant systems where default FEP settings could
not produce predictive models. We demonstrate that FEP-PB can rapidly
generate accurate FEP protocols for the previously challenging MCL1
system with limited human intervention. We also apply FEP-PB in a
real-world drug discovery setting to generate an accurate FEP protocol
for the p97 system. FEP-PB is able to generate a more accurate protocol
than the expert user, rapidly validating p97 as amenable to free energy
calculations. Additionally, through the active-learning workflow,
we are able to gain insight into which parameters are most important
for a given system. These results suggest that FEP-PB is a robust
tool that can aid in rapidly developing accurate FEP protocols and
increasing the number of targets that are amenable to the technology
Schrodinger-ANI: An Eight-Element Neural Network Interaction Potential with Greatly Expanded Coverage of Druglike Chemical Space
We have developed a neural network potential energy function for use in drug discovery, with chemical element support extended from 41% to 94% of druglike molecules based on ChEMBL. We expand on the work of Smith et al., with their highly accurate network for the elements H, C, N, O, creating a network for H, C, N, O, S, F, Cl, P. We focus particularly on the calculation of relative conformer energies, for which we show that our new potential energy function has an RMSE of 0.70 kcal/mol for prospective druglike molecule conformers, substantially better than the previous state of the art. The speed and accuracy of this model could greatly accelerate the parameterization of protein-ligand binding free energy calculations for novel druglike molecules
Combining Cloud-Based Free Energy Calculations, Synthetically Aware Enumerations and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization
The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high throughput screens (HTS) or computational virtual high throughput screens (vHTS). We have previously demonstrated that by coupling reaction-based enumeration, active learning and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based FEP profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of a predefined drug-like property space. We are able to achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR based multi-parameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can: (1) provide a 6.4 fold enrichment improvement in identifying 50 50 <100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches, and can rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.<br /