11 research outputs found

    Using the Variable-Nearest Neighbor Method To Identify P‑Glycoprotein Substrates and Inhibitors

    No full text
    Permeability glycoprotein (Pgp) is an essential membrane-bound transporter that efficiently extracts compounds from a cell. As such, it is a critical determinant of the pharmacokinetic properties of drugs. Multidrug resistance in cancer is often associated with overexpression of Pgp, which increases the efflux of chemotherapeutic agents from the cell. This, in turn, may prevent an effective treatment by reducing the effective intracellular concentrations of such agents. Consequently, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. Herein, using publically available data, we developed quantitative structure–activity relationship (QSAR) models of Pgp substrates and inhibitors. These models employed a variable-nearest neighbor (v-NN) method that calculated the structural similarity between molecules and hence possessed an applicability domain, that is, they used all nearest neighbors that met a minimum similarity constraint. The performance characteristics of these v-NN-based models were comparable or at times superior to those of other model constructs. The best v-NN models for identifying either Pgp substrates or inhibitors showed overall accuracies of >80% and κ values of >0.60 when tested on external data sets with candidate Pgp substrates and inhibitors. The v-NN prediction model with a well-defined applicability domain gave accurate and reliable results. The v-NN method is computationally efficient and requires no retraining of the prediction model when new assay information becomes availablean important feature when keeping QSAR models up-to-date and maintaining their performance at high levels

    General Purpose 2D and 3D Similarity Approach to Identify hERG Blockers

    No full text
    Screening compounds for human ether-à-go-go-related gene (hERG) channel inhibition is an important component of early stage drug development and assessment. In this study, we developed a high-confidence (p-value < 0.01) hERG prediction model based on a combined two-dimensional (2D) and three-dimensional (3D) modeling approach. We developed a 3D similarity conformation approach (SCA) based on examining a limited fixed number of pairwise 3D similarity scores between a query molecule and a set of known hERG blockers. By combining 3D SCA with 2D similarity ensemble approach (SEA) methods, we achieved a maximum sensitivity in hERG inhibition prediction with an accuracy not achieved by either method separately. The combined model achieved 69% sensitivity and 95% specificity on an independent external data set. Further validation showed that the model correctly picked up documented hERG inhibition or interactions among the Food and Drug Administration- approved drugs with the highest similarity scoreswith 18 of 20 correctly identified. The combination of ascertaining 2D and 3D similarity of compounds allowed us to synergistically use 2D fingerprint matching with 3D shape and chemical complementarity matching

    Using the Variable-Nearest Neighbor Method To Identify P‑Glycoprotein Substrates and Inhibitors

    No full text
    Permeability glycoprotein (Pgp) is an essential membrane-bound transporter that efficiently extracts compounds from a cell. As such, it is a critical determinant of the pharmacokinetic properties of drugs. Multidrug resistance in cancer is often associated with overexpression of Pgp, which increases the efflux of chemotherapeutic agents from the cell. This, in turn, may prevent an effective treatment by reducing the effective intracellular concentrations of such agents. Consequently, identifying compounds that can either be transported out of the cell by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. Herein, using publically available data, we developed quantitative structure–activity relationship (QSAR) models of Pgp substrates and inhibitors. These models employed a variable-nearest neighbor (v-NN) method that calculated the structural similarity between molecules and hence possessed an applicability domain, that is, they used all nearest neighbors that met a minimum similarity constraint. The performance characteristics of these v-NN-based models were comparable or at times superior to those of other model constructs. The best v-NN models for identifying either Pgp substrates or inhibitors showed overall accuracies of >80% and κ values of >0.60 when tested on external data sets with candidate Pgp substrates and inhibitors. The v-NN prediction model with a well-defined applicability domain gave accurate and reliable results. The v-NN method is computationally efficient and requires no retraining of the prediction model when new assay information becomes availablean important feature when keeping QSAR models up-to-date and maintaining their performance at high levels

    Using Chemical-Induced Gene Expression in Cultured Human Cells to Predict Chemical Toxicity

    No full text
    Chemical toxicity is conventionally evaluated in animal models. However, animal models are resource intensive; moreover, they face ethical and scientific challenges because the outcomes obtained by animal testing may not correlate with human responses. To develop an alternative method for assessing chemical toxicity, we investigated the feasibility of using chemical-induced genome-wide expression changes in cultured human cells to predict the potential of a chemical to cause specific organ injuries in humans. We first created signatures of chemical-induced gene expression in a vertebral-cancer of the prostate cell line for ∼15,000 chemicals tested in the US National Institutes of Health Library of Integrated Network-Based Cellular Signatures program. We then used the signatures to create naı̈ve Bayesian prediction models for chemical-induced human liver cholestasis, interstitial nephritis, and long QT syndrome. Detailed cross-validation analyses indicated that the models were robust with respect to false positives and false negatives in the samples we used to train the models and could predict the likelihood that chemicals would cause specific organ injuries. In addition, we performed a literature search for drugs and dietary supplements, not formally categorized as causing organ injuries in humans but predicted by our models to be most likely to do so. We found a high percentage of these compounds associated with case reports of relevant organ injuries, lending support to the idea that <i>in vitro</i> cell-based experiments can be used to predict the toxic potential of chemicals. We believe that this approach, combined with a robust technique to model human exposure to chemicals, may serve as a promising alternative to animal-based chemical toxicity assessment

    Critically Assessing the Predictive Power of QSAR Models for Human Liver Microsomal Stability

    No full text
    To lower the possibility of late-stage failures in the drug development process, an up-front assessment of absorption, distribution, metabolism, elimination, and toxicity is commonly implemented through a battery of <i>in silico</i> and <i>in vitro</i> assays. As <i>in vitro</i> data is accumulated, <i>in silico</i> quantitative structure–activity relationship (QSAR) models can be trained and used to assess compounds even before they are synthesized. Even though it is generally recognized that QSAR model performance deteriorates over time, rigorous independent studies of model performance deterioration is typically hindered by the lack of publicly available large data sets of structurally diverse compounds. Here, we investigated predictive properties of QSAR models derived from an assembly of publicly available human liver microsomal (HLM) stability data using variable nearest neighbor (<i>v</i>-NN) and random forest (RF) methods. In particular, we evaluated the degree of time-dependent model performance deterioration. Our results show that when evaluated by 10-fold cross-validation with all available HLM data randomly distributed among 10 equal-sized validation groups, we achieved high-quality model performance from both machine-learning methods. However, when we developed HLM models based on when the data appeared and tried to predict data published later, we found that neither method produced predictive models and that their applicability was dramatically reduced. On the other hand, when a small percentage of randomly selected compounds from data published later were included in the training set, performance of both machine-learning methods improved significantly. The implication is that 1) QSAR model quality should be analyzed in a time-dependent manner to assess their true predictive power and 2) it is imperative to retrain models with <i>any</i> up-to-date experimental data to ensure maximum applicability

    2D SMARTCyp Reactivity-Based Site of Metabolism Prediction for Major Drug-Metabolizing Cytochrome P450 Enzymes

    No full text
    Cytochrome P450 (CYP) 3A4, 2D6, 2C9, 2C19, and 1A2 are the most important drug-metabolizing enzymes in the human liver. Knowledge of which parts of a drug molecule are subject to metabolic reactions catalyzed by these enzymes is crucial for rational drug design to mitigate ADME/toxicity issues. SMARTCyp, a recently developed 2D ligand structure-based method, is able to predict site-specific metabolic reactivity of CYP3A4 and CYP2D6 substrates with an accuracy that rivals the best and more computationally demanding 3D structure-based methods. In this article, the SMARTCyp approach was extended to predict the metabolic hotspots for CYP2C9, CYP2C19, and CYP1A2 substrates. This was accomplished by taking into account the impact of a key substrate-receptor recognition feature of each enzyme as a correction term to the SMARTCyp reactivity. The corrected reactivity was then used to rank order the likely sites of CYP-mediated metabolic reactions. For 60 CYP1A2 substrates, the observed major sites of CYP1A2 catalyzed metabolic reactions were among the top-ranked 1, 2, and 3 positions in 67%, 80%, and 83% of the cases, respectively. The results were similar to those obtained by MetaSite and the reactivity + docking approach. For 70 CYP2C9 substrates, the observed sites of CYP2C9 metabolism were among the top-ranked 1, 2, and 3 positions in 66%, 86%, and 87% of the cases, respectively. These results were better than the corresponding results of StarDrop version 5.0, which were 61%, 73%, and 77%, respectively. For 36 compounds metabolized by CYP2C19, the observed sites of metabolism were found to be among the top-ranked 1, 2, and 3 sites in 78%, 89%, and 94% of the cases, respectively. The computational procedure was implemented as an extension to the program SMARTCyp 2.0. With the extension, the program can now predict the site of metabolism for all five major drug-metabolizing enzymes with an accuracy similar to or better than that achieved by the best 3D structure-based methods. Both the Java source code and the binary executable of the program are freely available to interested users

    Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes

    No full text
    The quantitative structure–activity relationship (QSAR) approach has been used to model a wide range of chemical-induced biological responses. However, it had not been utilized to model chemical-induced genomewide gene expression changes until very recently, owing to the complexity of training and evaluating a very large number of models. To address this issue, we examined the performance of a variable nearest neighbor (<i>v</i>-NN) method that uses information on near neighbors conforming to the principle that similar structures have similar activities. Using a data set of gene expression signatures of 13 150 compounds derived from cell-based measurements in the NIH Library of Integrated Network-based Cellular Signatures program, we were able to make predictions for 62% of the compounds in a 10-fold cross validation test, with a correlation coefficient of 0.61 between the predicted and experimentally derived signaturesa reproducibility rivaling that of high-throughput gene expression measurements. To evaluate the utility of the predicted gene expression signatures, we compared the predicted and experimentally derived signatures in their ability to identify drugs known to cause specific liver, kidney, and heart injuries. Overall, the predicted and experimentally derived signatures had similar receiver operating characteristics, whose areas under the curve ranged from 0.71 to 0.77 and 0.70 to 0.73, respectively, across the three organ injury models. However, detailed analyses of enrichment curves indicate that signatures predicted from multiple near neighbors outperformed those derived from experiments, suggesting that averaging information from near neighbors may help improve the signal from gene expression measurements. Our results demonstrate that the <i>v</i>-NN method can serve as a practical approach for modeling large-scale, genomewide, chemical-induced, gene expression changes

    General Approach to Estimate Error Bars for Quantitative Structure–Activity Relationship Predictions of Molecular Activity

    No full text
    Key requirements for quantitative structure–activity relationship (QSAR) models to gain acceptance by regulatory authorities include a defined domain of applicability (DA) and appropriate measures of goodness-of-fit, robustness, and predictivity. Hence, many DA metrics have been developed over the past two decades. The most intuitive are perhaps distance-to-model metrics, which are most commonly defined in terms of the mean distance between a molecule and its <i>k</i> nearest training samples. Detailed evaluations have shown that the variance of predictions by an ensemble of QSAR models may serve as a DA metric and can outperform distance-to-model metrics. Intriguingly, the performance of ensemble variance metric has led researchers to conclude that the error of predicting a new molecule does not depend on the input descriptors or machine-learning methods but on its distance to the training molecules. This implies that the distance to training samples may serve as the basis for developing a high-performance DA metric. In this article, we introduce a new Tanimoto distance-based DA metric called the sum of distance-weighted contributions (SDC), which takes into account contributions from all molecules in a training set. Using four acute chemical toxicity data sets of varying sizes and four other molecular property data sets, we demonstrate that SDC correlates well with the prediction error for all data sets regardless of the machine-learning methods and molecular descriptors used to build the QSAR models. Using the acute toxicity data sets, we compared the distribution of prediction errors with respect to SDC, the mean distance to <i>k</i>-nearest training samples, and the variance of random forest predictions. The results showed that the correlation with the prediction error was highest for SDC. We also demonstrate that SDC allows for the development of robust root mean squared error (RMSE) models and makes it possible to not only give a QSAR prediction but also provide an individual RMSE estimate for each molecule. Because SDC does not depend on a specific machine-learning method, it represents a canonical measure that can be widely used to estimate individual molecule prediction errors for any machine-learning method
    corecore