53 research outputs found

    Ranking docking poses by graph matching of protein–ligand interactions: lessons learned from the D3R Grand Challenge 2

    Get PDF
    International audienceA novel docking challenge has been set by the Drug Design Data Resource (D3R) in order to predict the pose and affinity ranking of a set of Farnesoid X receptor (FXR) agonists, prior to the public release of their bound X-ray structures and potencies. In a first phase, 36 agonists were docked to 26 Protein Data Bank (PDB) structures of the FXR receptor, and next rescored using the in-house developed GRIM method. GRIM aligns protein–ligand interaction patterns of docked poses to those of available PDB templates for the target protein, and rescore poses by a graph matching method. In agreement with results obtained during the previous 2015 docking challenge, we clearly show that GRIM rescoring improves the overall quality of top-ranked poses by prioritizing interaction patterns already visited in the PDB. Importantly, this challenge enables us to refine the applicability domain of the method by better defining the conditions of its success. We notably show that rescoring apolar ligands in hydrophobic pockets leads to frequent GRIM failures. In the second phase, 102 FXR agonists were ranked by decreasing affinity according to the Gibbs free energy of the corresponding GRIM-selected poses, computed by the HYDE scoring function. Interestingly, this fast and simple rescoring scheme provided the third most accurate ranking method among 57 contributions. Although the obtained ranking is still unsuitable for hit to lead optimization, the GRIM–HYDE scoring scheme is accurate and fast enough to post-process virtual screening dat

    Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

    Full text link
    Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Although conventional physics-based docking tools are widely utilized, their accuracy is compromised by limited conformational sampling and imprecise scoring functions. Recent advances have incorporated deep learning techniques to improve the accuracy of structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises concerns regarding the generalizability of these deep learning-based methods due to the limited training data. In this work, we show that by pre-training a geometry-aware SE(3)-Equivariant neural network on a large-scale docking conformation generated by traditional physics-based docking tools and then fine-tuning with a limited set of experimentally validated receptor-ligand complexes, we can achieve outstanding performance. This process involved the generation of 100 million docking conformations, consuming roughly 1 million CPU core days. The proposed model, HelixDock, aims to acquire the physical knowledge encapsulated by the physics-based docking tools during the pre-training phase. HelixDock has been benchmarked against both physics-based and deep learning-based baselines, showing that it outperforms its closest competitor by over 40% for RMSD. HelixDock also exhibits enhanced performance on a dataset that poses a greater challenge, thereby highlighting its robustness. Moreover, our investigation reveals the scaling laws governing pre-trained structure prediction models, indicating a consistent enhancement in performance with increases in model parameters and pre-training data. This study illuminates the strategic advantage of leveraging a vast and varied repository of generated data to advance the frontiers of AI-driven drug discovery

    A teach-discover-treat application of ZincPharmer: An online interactive pharmacophore modeling and virtual screening tool

    Get PDF
    The 2012 Teach-Discover-Treat (TDT) community-wide experiment provided a unique opportunity to test prospective virtual screening protocols targeting the anti-malarial target dihydroorotate dehydrogenase (DHODH). Facilitated by ZincPharmer, an open access online interactive pharmacophore search of the ZINC database, the experience resulted in the development of a novel classification scheme that successfully predicted the bound structure of a non-triazolopyrimidine inhibitor, as well as an overall hit rate of 27% of tested active compounds from multiple novel chemical scaffolds. The general approach entailed exhaustively building and screening sparse pharmacophore models comprising of a minimum of three features for each bound ligand in all available DHODH co-crystals and iteratively adding features that increased the number of known binders returned by the query. Collectively, the TDT experiment provided a unique opportunity to teach computational methods of drug discovery, develop innovative methodologies and prospectively discover new compounds active against DHODH. Copyright

    Improved prediction of ligand-protein binding affinities by meta-modeling

    Full text link
    The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts, as filtering potential candidates would save time and expenses for finding drugs. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Given many computational models for binding affinity prediction with varying results across targets, we herein develop a meta-modeling framework by integrating published empirical structure-based docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual models, training databases, and linear and nonlinear meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over individual base models. Our best meta-models achieve comparable performance to state-of-the-art exclusively structure-based deep learning tools. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain substantial improvement in binding affinity prediction while allowing control over input features such as physicochemical properties or molecular descriptors.Comment: 61 pages, 3 main tables, 6 main figures, 6 supplementary figures, and supporting information. For 8 supplementary tables and code, see https://github.com/Lee1701/Lee2023

    Solvated interaction energy: from small-molecule to antibody drug design

    Get PDF
    Scoring functions are ubiquitous in structure-based drug design as an aid to predicting binding modes and estimating binding affinities. Ideally, a scoring function should be broadly applicable, obviating the need to recalibrate and refit its parameters for every new target and class of ligands. Traditionally, drugs have been small molecules, but in recent years biologics, particularly antibodies, have become an increasingly important if not dominant class of therapeutics. This makes the goal of having a transferable scoring function, i.e., one that spans the range of small-molecule to protein ligands, even more challenging. One such broadly applicable scoring function is the Solvated Interaction Energy (SIE), which has been developed and applied in our lab for the last 15 years, leading to several important applications. This physics-based method arose from efforts to understand the physics governing binding events, with particular care given to the role played by solvation. SIE has been used by us and many independent labs worldwide for virtual screening and discovery of novel small-molecule binders or optimization of known drugs. Moreover, without any retraining, it is found to be transferrable to predictions of antibody-antigen relative binding affinities and as accurate as functions trained on protein-protein binding affinities. SIE has been incorporated in conjunction with other scoring functions into ADAPT (Assisted Design of Antibody and Protein Therapeutics), our platform for affinity modulation of antibodies. Application of ADAPT resulted in the optimization of several antibodies with 10-to-100-fold improvements in binding affinity. Further applications included broadening the specificity of a single-domain antibody to be cross-reactive with virus variants of both SARS-CoV-1 and SARS-CoV-2, and the design of safer antibodies by engineering of a pH switch to make them more selective towards acidic tumors while sparing normal tissues at physiological pH

    Cheminformatics Approaches to Structure Based Virtual Screening: Methodology Development and Applications

    Get PDF
    Structure-based virtual screening (VS) using 3D structures of protein targets has become a popular in silico drug discovery approach. The success of VS relies on the quality of underlying scoring functions. Despite of the success of structure-based VS in several reported cases, target-dependent VS performance and poor binding affinity predictions are well-known drawbacks in structure-based scoring functions. The goal of my dissertation is to use cheminformatics approaches to address above problems of the existing structure-based scoring methods. In Aim 1, cheminformatics practices are applied to those problems which conventional structure-based scoring functions find difficult (anti-bacterial leads efflux study) or fail to address (AmpC β-lactamase study). Predictive binary classification QSAR models can be constructed to classify complex efflux properties (low vs. high) and to differentiate AmpC β-lactamase binders from binding decoys (i.e., the false positives generated by scoring functions). The above models are applied to virtual screening and many computational hits are experimentally confirmed. In Aim 2, novel statistical binding and pose scoring functions (or pose filter in Aim 3) are developed, to accurately predict protein-ligand binding affinity and to discriminate native-like poses of ligands from pose decoys respectively. In my approach, the proteinligand interface is represented at the atomic level resolution and transformed via a special computational geometry approach called Delaunay tessellation to a collection of atom quadruplet motifs. And individual atom members of the motifs are characterized by conceptual Density Functional Theory (DFT)-based atomic properties. The binding scoring function shows acceptable prediction accuracy towards Community Structure-Activity Resources (CSAR) data sets with diverse protein families. In Aim 3, a two-step scoring protocol for target-specific virtual screening is developed and validated using the challenging Directory of Useful Decoys (DUD) data sets. In the first step our target-specific pose (-scoring) filter developed in Aim 2 is used to filter out/penalize putative pose decoys for every compound. Then in the second step the remaining putative native-like poses are scored with MedusaScore, which is a conventional force-field-based scoring function. This novel screening protocol can consistently improve MedusaScore VS performance, suggesting it possible applications to practical pharmaceutically relevant targets

    3D Convolutional Neural Networks for Computational Drug Discovery

    Get PDF
    This thesis describes aspects of the implementation and application of voxel-based con- volutional neural networks (CNNs) to problems in computational drug discovery. It opens by justifying the novelty of this approach by presenting a more mainstream approach to the common tasks of virtual screening and binding pose prediction, augmented with more sim- plistic machine learning methods, and demonstrating their suboptimal performance when applied prospectively. It then describes my contributions to our group’s development of voxel-based CNNs as we honed their implementation and training strategy, and reports our library that facilitates featurization and training using this approach. It continues with a prospective assessment of their performance, analogous to the first prospective evaluation, with the addition of a novel CNN-based pose sampling strategy. Next it makes a foray into model explanation, first in an oblique fashion, by examining the transferability of models to tasks that are distinct from but related to the tasks for which they were trained, and by a comparison with an approach based on exploiting dataset bias using other machine learning methods. Finally it describes the implementation of a more direct approach to model ex- planation, by using a trained network to perform optimization of inputs with respect to the network as a whole or individual nodes and analyzing the content of the result as well as its utility as a pseudo-pharmacophore
    corecore