5 research outputs found
Adverse Drug Reaction Prediction Using Scores Produced by Large-Scale Drug-Protein Target Docking on High-Performance Computing Machines
<div><p>Late-stage or post-market identification of adverse drug reactions (ADRs) is a significant public health issue and a source of major economic liability for drug development. Thus, reliable <i>in silico</i> screening of drug candidates for possible ADRs would be advantageous. In this work, we introduce a computational approach that predicts ADRs by combining the results of molecular docking and leverages known ADR information from DrugBank and SIDER. We employed a recently parallelized version of AutoDock Vina (VinaLC) to dock 906 small molecule drugs to a virtual panel of 409 DrugBank protein targets. L1-regularized logistic regression models were trained on the resulting docking scores of a 560 compound subset from the initial 906 compounds to predict 85 side effects, grouped into 10 ADR phenotype groups. Only 21% (87 out of 409) of the drug-protein binding features involve known targets of the drug subset, providing a significant probe of off-target effects. As a control, associations of this drug subset with the 555 annotated targets of these compounds, as reported in DrugBank, were used as features to train a separate group of models. The Vina off-target models and the DrugBank on-target models yielded comparable median area-under-the-receiver-operating-characteristic-curves (AUCs) during 10-fold cross-validation (0.60–0.69 and 0.61–0.74, respectively). Evidence was found in the PubMed literature to support several putative ADR-protein associations identified by our analysis. Among them, several associations between neoplasm-related ADRs and known tumor suppressor and tumor invasiveness marker proteins were found. A dual role for interstitial collagenase in both neoplasms and aneurysm formation was also identified. These associations all involve off-target proteins and could not have been found using available drug/on-target interaction data. This study illustrates a path forward to comprehensive ADR virtual screening that can potentially scale with increasing number of CPUs to tens of thousands of protein targets and millions of potential drug candidates.</p></div
Data integration/analysis workflow scheme.
<p>The UniProt IDs of 4,020 proteins identified in DrugBank as drug targets were extracted. We obtained 409 experimental protein structures from the Protein Data Bank (PDB) to be used as a virtual panel and docked to 906 FDA-approved small molecule compounds using the VinaLC docking code, run on a high-performance computing machine at LLNL. 560 compounds had side effect information in the SIDER database and were used in subsequent statistical analysis to build logistic regression models for ADR prediction.</p
ADR-protein association derived from models built using the 560×16 GBSA-corrected virtual screening panel.
<p>ADR-protein association derived from models built using the 560×16 GBSA-corrected virtual screening panel.</p
ADR prediction using a 16-protein virtual toxicity screening panel suggested by Bowes <i>et al.</i>[6].
<p>Red boxes indicate models trained on GBSA-corrected VinaLC docking scores while the blue boxes indicate models trained on DrugBank drug-target protein associations. The boxplots comprise the distribution of median AUC scores after one vs. all L1-regularized logistic regression model training using 10-fold cross-validation repeated ten times. The individual models were trained on ten different adverse drug reaction (ADR) groups: Neoplasms, benign, malignant, and unspecified ("Neoplasms"), Immune system disorders ("Immune system disorders"), Cardiac disorders ("Cardiac disorders"), Gastrointestinal disorders ("Gastrointestinal disorders"), Blood and lymphatic systems disorders ("Blood and lymphatic disorders"), Hepatobiliary disorders ("Liver disorders"), Vascular disorders ("Vascular disorders"), Endocrine disorders ("Endocrine disorders"), Psychiatric disorders ("Psychiatric disorders"), and Renal disorders ("Renal & urinary disorders").</p
ADR prediction models using ‘Vina Off Targets’ and ‘DrugBank On-Targets’.
<p>Boxplots of median AUC results for one vs. all L1-regularized logistic regression models trained using 10-fold cross-validation repeated ten times are shown. The individual models were trained on ten different adverse drug reaction (ADR) groups: Vascular disorders ("Vascular disorders"), Neoplasms, benign, malignant, and unspecified ("Neoplasms"), Immune system disorders ("Immune system disorders"), Blood and lymphatic systems disorders ("Blood and lymphatic disorders"), Psychiatric disorders ("Psychiatric disorders"), Endocrine disorders ("Endocrine disorders"), Renal disorders ("Renal & urinary disorders"), Hepatobiliary disorders ("Liver disorders"), Gastrointestinal disorders ("Gastrointestinal disorders"), and Cardiac disorders ("Cardiac disorders"). Red boxes indicate models trained on 560×409 VinaLC docking scores used as drug-protein binding features. Blue boxes indicate models trained on a 560×555 matrix containing DrugBank drug-target protein associations. VinaLC off-target models had higher AUCs than DrugBank on-target models for the “Vascular disorders” and “Neoplasms” ADR groups.</p