5 research outputs found

    OnionNet-2: A Convolutional Neural Network Model for Predicting Protein-Ligand Binding Affinity based on Residue-Atom Contacting Shells

    Full text link
    One key task in virtual screening is to accurately predict the binding affinity (â–³\triangleGG) of protein-ligand complexes. Recently, deep learning (DL) has significantly increased the predicting accuracy of scoring functions due to the extraordinary ability of DL to extract useful features from raw data. Nevertheless, more efforts still need to be paid in many aspects, for the aim of increasing prediction accuracy and decreasing computational cost. In this study, we proposed a simple scoring function (called OnionNet-2) based on convolutional neural network to predict â–³\triangleGG. The protein-ligand interactions are characterized by the number of contacts between protein residues and ligand atoms in multiple distance shells. Compared to published models, the efficacy of OnionNet-2 is demonstrated to be the best for two widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model was further verified by non-experimental decoy structures from docking program and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which showed great success. Thus, our study provides a simple but efficient scoring function for predicting protein-ligand binding free energy.Comment: 7 pages, 4 figures, 1 tabl

    Combined Application of Cheminformatics- and Physical Force Field-Based Scoring Functions Improves Binding Affinity Prediction for CSAR Data Sets

    Get PDF
    The curated CSAR-NRC benchmark sets provide valuable opportunity for testing or comparing the performance of both existing and novel scoring functions. We apply two different scoring functions, both independently and in combination, to predict binding affinity of ligands in the CSAR-NRC datasets. One, reported here for the first time, employs multiple chemical-geometrical descriptors of the protein-ligand interface to develop Quantitative Structure – Binding Affinity Relationships (QSBAR) models; these models are then used to predict binding affinity of ligands in the external dataset. Second is a physical force field-based scoring function, MedusaScore. We show that both individual scoring functions achieve statistically significant prediction accuracies with the squared correlation coefficient (R2) between actual and predicted binding affinity of 0.44/0.53 (Set1/Set2) with QSBAR models and 0.34/0.47 (Set1/Set2) with MedusaScore. Importantly, we find that the combination of QSBAR models and MedusaScore into consensus scoring function affords higher prediction accuracy than any of the contributing methods achieving R2 of 0.45/0.58 (Set1/Set2). Furthermore, we identify several chemical features and non-covalent interactions that may be responsible for the inaccurate prediction of binding affinity for several ligands by the scoring functions employed in this study

    Solvated interaction energy: from small-molecule to antibody drug design

    Get PDF
    Scoring functions are ubiquitous in structure-based drug design as an aid to predicting binding modes and estimating binding affinities. Ideally, a scoring function should be broadly applicable, obviating the need to recalibrate and refit its parameters for every new target and class of ligands. Traditionally, drugs have been small molecules, but in recent years biologics, particularly antibodies, have become an increasingly important if not dominant class of therapeutics. This makes the goal of having a transferable scoring function, i.e., one that spans the range of small-molecule to protein ligands, even more challenging. One such broadly applicable scoring function is the Solvated Interaction Energy (SIE), which has been developed and applied in our lab for the last 15 years, leading to several important applications. This physics-based method arose from efforts to understand the physics governing binding events, with particular care given to the role played by solvation. SIE has been used by us and many independent labs worldwide for virtual screening and discovery of novel small-molecule binders or optimization of known drugs. Moreover, without any retraining, it is found to be transferrable to predictions of antibody-antigen relative binding affinities and as accurate as functions trained on protein-protein binding affinities. SIE has been incorporated in conjunction with other scoring functions into ADAPT (Assisted Design of Antibody and Protein Therapeutics), our platform for affinity modulation of antibodies. Application of ADAPT resulted in the optimization of several antibodies with 10-to-100-fold improvements in binding affinity. Further applications included broadening the specificity of a single-domain antibody to be cross-reactive with virus variants of both SARS-CoV-1 and SARS-CoV-2, and the design of safer antibodies by engineering of a pH switch to make them more selective towards acidic tumors while sparing normal tissues at physiological pH

    Ligand-Protein Binding Affinity Prediction Using Machine Learning Scoring Functions.

    Get PDF
    In recent years, artificial intelligence makes its appearance in extremely different fields with promising results able to produce enormous steps forward in some circumstances. In chemoinformatics the use of machine learning technique, in particular, allows the scientific community to build apparently accurate scoring functions for computational docking. These types of scoring functions can overperform classic ones, the type of scoring functions used until now. However the comparison between classic and machine learning scoring functions are based on particular tests which can favour these latter, as highlighted by some studies. In particular the machine learning scoring functions, per definition, must be trained on some data, passing to the model the instances chosen to describe the complexes and the relative ligand-protein affinity. In these conditions the scoring power of the machine learning scoring functions can be evaluated on different dataset and the scoring functions performance recorded can be different depending on it. In particular, datasets very similar to the one used for the training phase of the machine learning scoring function can facilitate in reaching high performance in the scoring power. The objective of the present study is to verify the real efficiency and the effective performances of the new born machine learning scoring functions. Our aim is to give an answer to the scientific community about the doubts on the fact that the machine learning scoring function can be or not the revolutionary road to be followed in the field of chemioinformatic and drug discovery. In order to do this many tests are conducted and a definitive test protocol to be executed to exhaustive validate a new machine learning scoring function is proposed . Here we investigate what are the circumstances in which a machine learning scoring function produces overestimated performances and why it can happen. As a possible solution we propose a tests protocol to be followed in order to guarantee a real performance descriptions of machine learning scoring functions. Eventually an effective and innovative solution in the field of machine learning scoring functions is proposed. It consists in the use of per-target scoring functions which are machine learning scoring functions created using complexes coming from a single protein and able to predict the affinity of complexes which use that target. The data used to build the model are synthetic and for this reason are easy to be created. The performances on the target chosen are better than the ones obtained with basic model of scoring functions and machine learning scoring functions trained on database composed by more than one protein

    IMPROVING RATIONAL DRUG DESIGN BY INCORPORATING NOVEL BIOPHYSICAL INSIGHT

    Get PDF
    Computer-aided drug design is a valuable and effective complement to conventional experimental drug discovery methods. In this thesis, we will discuss our contributions to advancing a number of outstanding challenges in computational drug discovery: understanding protein flexibility and dynamics, the role of water in small molecule binding and using and understanding large amounts of data. First, we describe the molecular steps involved in the induced-fit binding mechanism of p53 and MDM2. We use molecular dynamics simulations to understand the key chemistry responsible for the dynamic transition between the apo and holo structures of MDM2. This chemistry involves not only the indole side chain of the anchor residue of p53, Trp23, but surprisingly, the beta-carbon as well. We demonstrate that this chemistry plays a key role in opening the binding site by coordinating the position and orientation of MDM2 residues, Val93 and His96, through a previously undescribed transition state. We confirm these findings by observing that this chemistry is preserved in all available inhibitor-bound MDM2 co-crystal structures. Second, we discuss our advances in understanding water molecules in ligand binding sites by data mining the structural information of water molecules found in X-ray crystal structures. We examine a large set of paired bound and unbound proteins and compare the water molecules found in the binding site of the unbound structure to the functional groups on the ligand that displace them upon binding. We identify a number of generalized functional groups that are associated with characteristic clusters of water molecules. This information has been utilized in several successful and ongoing virtual screens. Third, we discuss software that we have developed that allows for very efficient exploration and selection of virtual screening results. Implemented as a PyMOL plugin, ClusterMols clusters compounds based on a user-defined level of chemical similarity. The software also provides advanced visualization tools and a number of controls for quickly navigating and selecting compounds of interest, as well as the ability to check online for available vendors. Finally, we present several published examples of modeling protein-lipid and protein-small molecules interactions for a number of important targets including ABL, c-Src and 5-LOX
    corecore