53 research outputs found
Ranking docking poses by graph matching of protein–ligand interactions: lessons learned from the D3R Grand Challenge 2
International audienceA novel docking challenge has been set by the Drug Design Data Resource (D3R) in order to predict the pose and affinity ranking of a set of Farnesoid X receptor (FXR) agonists, prior to the public release of their bound X-ray structures and potencies. In a first phase, 36 agonists were docked to 26 Protein Data Bank (PDB) structures of the FXR receptor, and next rescored using the in-house developed GRIM method. GRIM aligns protein–ligand interaction patterns of docked poses to those of available PDB templates for the target protein, and rescore poses by a graph matching method. In agreement with results obtained during the previous 2015 docking challenge, we clearly show that GRIM rescoring improves the overall quality of top-ranked poses by prioritizing interaction patterns already visited in the PDB. Importantly, this challenge enables us to refine the applicability domain of the method by better defining the conditions of its success. We notably show that rescoring apolar ligands in hydrophobic pockets leads to frequent GRIM failures. In the second phase, 102 FXR agonists were ranked by decreasing affinity according to the Gibbs free energy of the corresponding GRIM-selected poses, computed by the HYDE scoring function. Interestingly, this fast and simple rescoring scheme provided the third most accurate ranking method among 57 contributions. Although the obtained ranking is still unsuitable for hit to lead optimization, the GRIM–HYDE scoring scheme is accurate and fast enough to post-process virtual screening dat
Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
Protein-ligand structure prediction is an essential task in drug discovery,
predicting the binding interactions between small molecules (ligands) and
target proteins (receptors). Although conventional physics-based docking tools
are widely utilized, their accuracy is compromised by limited conformational
sampling and imprecise scoring functions. Recent advances have incorporated
deep learning techniques to improve the accuracy of structure prediction.
Nevertheless, the experimental validation of docking conformations remains
costly, it raises concerns regarding the generalizability of these deep
learning-based methods due to the limited training data. In this work, we show
that by pre-training a geometry-aware SE(3)-Equivariant neural network on a
large-scale docking conformation generated by traditional physics-based docking
tools and then fine-tuning with a limited set of experimentally validated
receptor-ligand complexes, we can achieve outstanding performance. This process
involved the generation of 100 million docking conformations, consuming roughly
1 million CPU core days. The proposed model, HelixDock, aims to acquire the
physical knowledge encapsulated by the physics-based docking tools during the
pre-training phase. HelixDock has been benchmarked against both physics-based
and deep learning-based baselines, showing that it outperforms its closest
competitor by over 40% for RMSD. HelixDock also exhibits enhanced performance
on a dataset that poses a greater challenge, thereby highlighting its
robustness. Moreover, our investigation reveals the scaling laws governing
pre-trained structure prediction models, indicating a consistent enhancement in
performance with increases in model parameters and pre-training data. This
study illuminates the strategic advantage of leveraging a vast and varied
repository of generated data to advance the frontiers of AI-driven drug
discovery
A teach-discover-treat application of ZincPharmer: An online interactive pharmacophore modeling and virtual screening tool
The 2012 Teach-Discover-Treat (TDT) community-wide experiment provided a unique opportunity to test prospective virtual screening protocols targeting the anti-malarial target dihydroorotate dehydrogenase (DHODH). Facilitated by ZincPharmer, an open access online interactive pharmacophore search of the ZINC database, the experience resulted in the development of a novel classification scheme that successfully predicted the bound structure of a non-triazolopyrimidine inhibitor, as well as an overall hit rate of 27% of tested active compounds from multiple novel chemical scaffolds. The general approach entailed exhaustively building and screening sparse pharmacophore models comprising of a minimum of three features for each bound ligand in all available DHODH co-crystals and iteratively adding features that increased the number of known binders returned by the query. Collectively, the TDT experiment provided a unique opportunity to teach computational methods of drug discovery, develop innovative methodologies and prospectively discover new compounds active against DHODH. Copyright
Improved prediction of ligand-protein binding affinities by meta-modeling
The accurate screening of candidate drug ligands against target proteins
through computational approaches is of prime interest to drug development
efforts, as filtering potential candidates would save time and expenses for
finding drugs. Such virtual screening depends in part on methods to predict the
binding affinity between ligands and proteins. Given many computational models
for binding affinity prediction with varying results across targets, we herein
develop a meta-modeling framework by integrating published empirical
structure-based docking and sequence-based deep learning models. In building
this framework, we evaluate many combinations of individual models, training
databases, and linear and nonlinear meta-modeling approaches. We show that many
of our meta-models significantly improve affinity predictions over individual
base models. Our best meta-models achieve comparable performance to
state-of-the-art exclusively structure-based deep learning tools. Overall, we
demonstrate that diverse modeling approaches can be ensembled together to gain
substantial improvement in binding affinity prediction while allowing control
over input features such as physicochemical properties or molecular
descriptors.Comment: 61 pages, 3 main tables, 6 main figures, 6 supplementary figures, and
supporting information. For 8 supplementary tables and code, see
https://github.com/Lee1701/Lee2023
Solvated interaction energy: from small-molecule to antibody drug design
Scoring functions are ubiquitous in structure-based drug design as an aid to predicting binding modes and estimating binding affinities. Ideally, a scoring function should be broadly applicable, obviating the need to recalibrate and refit its parameters for every new target and class of ligands. Traditionally, drugs have been small molecules, but in recent years biologics, particularly antibodies, have become an increasingly important if not dominant class of therapeutics. This makes the goal of having a transferable scoring function, i.e., one that spans the range of small-molecule to protein ligands, even more challenging. One such broadly applicable scoring function is the Solvated Interaction Energy (SIE), which has been developed and applied in our lab for the last 15 years, leading to several important applications. This physics-based method arose from efforts to understand the physics governing binding events, with particular care given to the role played by solvation. SIE has been used by us and many independent labs worldwide for virtual screening and discovery of novel small-molecule binders or optimization of known drugs. Moreover, without any retraining, it is found to be transferrable to predictions of antibody-antigen relative binding affinities and as accurate as functions trained on protein-protein binding affinities. SIE has been incorporated in conjunction with other scoring functions into ADAPT (Assisted Design of Antibody and Protein Therapeutics), our platform for affinity modulation of antibodies. Application of ADAPT resulted in the optimization of several antibodies with 10-to-100-fold improvements in binding affinity. Further applications included broadening the specificity of a single-domain antibody to be cross-reactive with virus variants of both SARS-CoV-1 and SARS-CoV-2, and the design of safer antibodies by engineering of a pH switch to make them more selective towards acidic tumors while sparing normal tissues at physiological pH
Cheminformatics Approaches to Structure Based Virtual Screening: Methodology Development and Applications
Structure-based virtual screening (VS) using 3D structures of protein targets has
become a popular in silico drug discovery approach. The success of VS relies on the quality
of underlying scoring functions. Despite of the success of structure-based VS in several
reported cases, target-dependent VS performance and poor binding affinity predictions are
well-known drawbacks in structure-based scoring functions. The goal of my dissertation is to
use cheminformatics approaches to address above problems of the existing structure-based
scoring methods.
In Aim 1, cheminformatics practices are applied to those problems which
conventional structure-based scoring functions find difficult (anti-bacterial leads efflux study)
or fail to address (AmpC β-lactamase study). Predictive binary classification QSAR models
can be constructed to classify complex efflux properties (low vs. high) and to differentiate
AmpC β-lactamase binders from binding decoys (i.e., the false positives generated by scoring
functions). The above models are applied to virtual screening and many computational hits
are experimentally confirmed.
In Aim 2, novel statistical binding and pose scoring functions (or pose filter in Aim 3)
are developed, to accurately predict protein-ligand binding affinity and to discriminate
native-like poses of ligands from pose decoys respectively. In my approach, the proteinligand
interface is represented at the atomic level resolution and transformed via a special
computational geometry approach called Delaunay tessellation to a collection of atom
quadruplet motifs. And individual atom members of the motifs are characterized by
conceptual Density Functional Theory (DFT)-based atomic properties. The binding scoring
function shows acceptable prediction accuracy towards Community Structure-Activity
Resources (CSAR) data sets with diverse protein families.
In Aim 3, a two-step scoring protocol for target-specific virtual screening is
developed and validated using the challenging Directory of Useful Decoys (DUD) data sets.
In the first step our target-specific pose (-scoring) filter developed in Aim 2 is used to filter
out/penalize putative pose decoys for every compound. Then in the second step the
remaining putative native-like poses are scored with MedusaScore, which is a conventional
force-field-based scoring function. This novel screening protocol can consistently improve
MedusaScore VS performance, suggesting it possible applications to practical
pharmaceutically relevant targets
3D Convolutional Neural Networks for Computational Drug Discovery
This thesis describes aspects of the implementation and application of voxel-based con- volutional neural networks (CNNs) to problems in computational drug discovery. It opens by justifying the novelty of this approach by presenting a more mainstream approach to the common tasks of virtual screening and binding pose prediction, augmented with more sim- plistic machine learning methods, and demonstrating their suboptimal performance when applied prospectively. It then describes my contributions to our group’s development of voxel-based CNNs as we honed their implementation and training strategy, and reports our library that facilitates featurization and training using this approach. It continues with a prospective assessment of their performance, analogous to the first prospective evaluation, with the addition of a novel CNN-based pose sampling strategy. Next it makes a foray into model explanation, first in an oblique fashion, by examining the transferability of models to tasks that are distinct from but related to the tasks for which they were trained, and by a comparison with an approach based on exploiting dataset bias using other machine learning methods. Finally it describes the implementation of a more direct approach to model ex- planation, by using a trained network to perform optimization of inputs with respect to the network as a whole or individual nodes and analyzing the content of the result as well as its utility as a pseudo-pharmacophore
- …