11 research outputs found
Using the Variable-Nearest Neighbor Method To Identify P‑Glycoprotein Substrates and Inhibitors
Permeability glycoprotein
(Pgp) is an essential membrane-bound
transporter that efficiently extracts compounds from a cell. As such,
it is a critical determinant of the pharmacokinetic properties of
drugs. Multidrug resistance in cancer is often associated with overexpression
of Pgp, which increases the efflux of chemotherapeutic agents from
the cell. This, in turn, may prevent an effective treatment by reducing
the effective intracellular concentrations of such agents. Consequently,
identifying compounds that can either be transported out of the cell
by Pgp (substrates) or impair Pgp function (inhibitors) is of great
interest. Herein, using publically available data, we developed quantitative
structure–activity relationship (QSAR) models of Pgp substrates
and inhibitors. These models employed a variable-nearest neighbor
(v-NN) method that calculated the structural similarity between molecules
and hence possessed an applicability domain, that is, they used all
nearest neighbors that met a minimum similarity constraint. The performance
characteristics of these v-NN-based models were comparable or at times
superior to those of other model constructs. The best v-NN models
for identifying either Pgp substrates or inhibitors showed overall
accuracies of >80% and κ values of >0.60 when tested on
external
data sets with candidate Pgp substrates and inhibitors. The v-NN prediction
model with a well-defined applicability domain gave accurate and reliable
results. The v-NN method is computationally efficient and requires
no retraining of the prediction model when new assay information becomes
availableî—¸an important feature when keeping QSAR models up-to-date
and maintaining their performance at high levels
General Purpose 2D and 3D Similarity Approach to Identify hERG Blockers
Screening compounds for human ether-à-go-go-related
gene
(hERG) channel inhibition is an important component of early stage
drug development and assessment. In this study, we developed a high-confidence
(p-value < 0.01) hERG prediction model based on a combined two-dimensional
(2D) and three-dimensional (3D) modeling approach. We developed a
3D similarity conformation approach (SCA) based on examining a limited
fixed number of pairwise 3D similarity scores between a query molecule
and a set of known hERG blockers. By combining 3D SCA with 2D similarity
ensemble approach (SEA) methods, we achieved a maximum sensitivity
in hERG inhibition prediction with an accuracy not achieved by either
method separately. The combined model achieved 69% sensitivity and
95% specificity on an independent external data set. Further validation
showed that the model correctly picked up documented hERG inhibition
or interactions among the Food and Drug Administration- approved drugs
with the highest similarity scoresî—¸with 18 of 20 correctly
identified. The combination of ascertaining 2D and 3D similarity of
compounds allowed us to synergistically use 2D fingerprint matching
with 3D shape and chemical complementarity matching
Using the Variable-Nearest Neighbor Method To Identify P‑Glycoprotein Substrates and Inhibitors
Permeability glycoprotein
(Pgp) is an essential membrane-bound
transporter that efficiently extracts compounds from a cell. As such,
it is a critical determinant of the pharmacokinetic properties of
drugs. Multidrug resistance in cancer is often associated with overexpression
of Pgp, which increases the efflux of chemotherapeutic agents from
the cell. This, in turn, may prevent an effective treatment by reducing
the effective intracellular concentrations of such agents. Consequently,
identifying compounds that can either be transported out of the cell
by Pgp (substrates) or impair Pgp function (inhibitors) is of great
interest. Herein, using publically available data, we developed quantitative
structure–activity relationship (QSAR) models of Pgp substrates
and inhibitors. These models employed a variable-nearest neighbor
(v-NN) method that calculated the structural similarity between molecules
and hence possessed an applicability domain, that is, they used all
nearest neighbors that met a minimum similarity constraint. The performance
characteristics of these v-NN-based models were comparable or at times
superior to those of other model constructs. The best v-NN models
for identifying either Pgp substrates or inhibitors showed overall
accuracies of >80% and κ values of >0.60 when tested on
external
data sets with candidate Pgp substrates and inhibitors. The v-NN prediction
model with a well-defined applicability domain gave accurate and reliable
results. The v-NN method is computationally efficient and requires
no retraining of the prediction model when new assay information becomes
availableî—¸an important feature when keeping QSAR models up-to-date
and maintaining their performance at high levels
Using Chemical-Induced Gene Expression in Cultured Human Cells to Predict Chemical Toxicity
Chemical
toxicity is conventionally evaluated in animal models. However, animal
models are resource intensive; moreover, they face ethical and scientific
challenges because the outcomes obtained by animal testing may not
correlate with human responses. To develop an alternative method for
assessing chemical toxicity, we investigated the feasibility of using
chemical-induced genome-wide expression changes in cultured human
cells to predict the potential of a chemical to cause specific organ
injuries in humans. We first created signatures of chemical-induced
gene expression in a vertebral-cancer of the prostate cell line for ∼15,000
chemicals tested in the US National Institutes of Health Library of
Integrated Network-Based Cellular Signatures program. We then used
the signatures to create naı̈ve Bayesian prediction models
for chemical-induced human liver cholestasis, interstitial nephritis,
and long QT syndrome. Detailed cross-validation analyses indicated
that the models were robust with respect to false positives and false
negatives in the samples we used to train the models and could predict
the likelihood that chemicals would cause specific organ injuries.
In addition, we performed a literature search for drugs and dietary
supplements, not formally categorized as causing organ injuries in
humans but predicted by our models to be most likely to do so. We
found a high percentage of these compounds associated with case reports
of relevant organ injuries, lending support to the idea that <i>in vitro</i> cell-based experiments can be used to predict the
toxic potential of chemicals. We believe that this approach, combined
with a robust technique to model human exposure to chemicals, may
serve as a promising alternative to animal-based chemical toxicity
assessment
Critically Assessing the Predictive Power of QSAR Models for Human Liver Microsomal Stability
To
lower the possibility of late-stage failures in the drug development
process, an up-front assessment of absorption, distribution, metabolism,
elimination, and toxicity is commonly implemented through a battery
of <i>in silico</i> and <i>in vitro</i> assays.
As <i>in vitro</i> data is accumulated, <i>in silico</i> quantitative structure–activity relationship (QSAR) models
can be trained and used to assess compounds even before they are synthesized.
Even though it is generally recognized that QSAR model performance
deteriorates over time, rigorous independent studies of model performance
deterioration is typically hindered by the lack of publicly available
large data sets of structurally diverse compounds. Here, we investigated
predictive properties of QSAR models derived from an assembly of publicly
available human liver microsomal (HLM) stability data using variable
nearest neighbor (<i>v</i>-NN) and random forest (RF) methods.
In particular, we evaluated the degree of time-dependent model performance
deterioration. Our results show that when evaluated by 10-fold cross-validation
with all available HLM data randomly distributed among 10 equal-sized
validation groups, we achieved high-quality model performance from
both machine-learning methods. However, when we developed HLM models
based on when the data appeared and tried to predict data published
later, we found that neither method produced predictive models and
that their applicability was dramatically reduced. On the other hand,
when a small percentage of randomly selected compounds from data published
later were included in the training set, performance of both machine-learning
methods improved significantly. The implication is that 1) QSAR model
quality should be analyzed in a time-dependent manner to assess their
true predictive power and 2) it is imperative to retrain models with <i>any</i> up-to-date experimental data to ensure maximum applicability
2D SMARTCyp Reactivity-Based Site of Metabolism Prediction for Major Drug-Metabolizing Cytochrome P450 Enzymes
Cytochrome P450 (CYP) 3A4, 2D6, 2C9, 2C19, and 1A2 are
the most
important drug-metabolizing enzymes in the human liver. Knowledge
of which parts of a drug molecule are subject to metabolic reactions
catalyzed by these enzymes is crucial for rational drug design to
mitigate ADME/toxicity issues. SMARTCyp, a recently developed 2D ligand
structure-based method, is able to predict site-specific metabolic
reactivity of CYP3A4 and CYP2D6 substrates with an accuracy that rivals
the best and more computationally demanding 3D structure-based methods.
In this article, the SMARTCyp approach was extended to predict the
metabolic hotspots for CYP2C9, CYP2C19, and CYP1A2 substrates. This
was accomplished by taking into account the impact of a key substrate-receptor
recognition feature of each enzyme as a correction term to the SMARTCyp
reactivity. The corrected reactivity was then used to rank order the
likely sites of CYP-mediated metabolic reactions. For 60 CYP1A2 substrates,
the observed major sites of CYP1A2 catalyzed metabolic reactions were
among the top-ranked 1, 2, and 3 positions in 67%, 80%, and 83% of
the cases, respectively. The results were similar to those obtained
by MetaSite and the reactivity + docking approach. For 70 CYP2C9 substrates,
the observed sites of CYP2C9 metabolism were among the top-ranked
1, 2, and 3 positions in 66%, 86%, and 87% of the cases, respectively.
These results were better than the corresponding results of StarDrop
version 5.0, which were 61%, 73%, and 77%, respectively. For 36 compounds
metabolized by CYP2C19, the observed sites of metabolism were found
to be among the top-ranked 1, 2, and 3 sites in 78%, 89%, and 94%
of the cases, respectively. The computational procedure was implemented
as an extension to the program SMARTCyp 2.0. With the extension, the
program can now predict the site of metabolism for all five major
drug-metabolizing enzymes with an accuracy similar to or better than
that achieved by the best 3D structure-based methods. Both the Java
source code and the binary executable of the program are freely available
to interested users
Molecular Structure-Based Large-Scale Prediction of Chemical-Induced Gene Expression Changes
The
quantitative structure–activity relationship (QSAR) approach
has been used to model a wide range of chemical-induced biological
responses. However, it had not been utilized to model chemical-induced
genomewide gene expression changes until very recently, owing to the
complexity of training and evaluating a very large number of models.
To address this issue, we examined the performance of a variable nearest
neighbor (<i>v</i>-NN) method that uses information on near
neighbors conforming to the principle that similar structures have
similar activities. Using a data set of gene expression signatures
of 13 150 compounds derived from cell-based measurements in
the NIH Library of Integrated Network-based Cellular Signatures program,
we were able to make predictions for 62% of the compounds in a 10-fold
cross validation test, with a correlation coefficient of 0.61 between
the predicted and experimentally derived signaturesî—¸a reproducibility
rivaling that of high-throughput gene expression measurements. To
evaluate the utility of the predicted gene expression signatures,
we compared the predicted and experimentally derived signatures in
their ability to identify drugs known to cause specific liver, kidney,
and heart injuries. Overall, the predicted and experimentally derived
signatures had similar receiver operating characteristics, whose areas
under the curve ranged from 0.71 to 0.77 and 0.70 to 0.73, respectively,
across the three organ injury models. However, detailed analyses of
enrichment curves indicate that signatures predicted from multiple
near neighbors outperformed those derived from experiments, suggesting
that averaging information from near neighbors may help improve the
signal from gene expression measurements. Our results demonstrate
that the <i>v</i>-NN method can serve as a practical approach
for modeling large-scale, genomewide, chemical-induced, gene expression
changes
General Approach to Estimate Error Bars for Quantitative Structure–Activity Relationship Predictions of Molecular Activity
Key
requirements for quantitative structure–activity relationship
(QSAR) models to gain acceptance by regulatory authorities include
a defined domain of applicability (DA) and appropriate measures of
goodness-of-fit, robustness, and predictivity. Hence, many DA metrics
have been developed over the past two decades. The most intuitive
are perhaps distance-to-model metrics, which are most commonly defined
in terms of the mean distance between a molecule and its <i>k</i> nearest training samples. Detailed evaluations have shown that the
variance of predictions by an ensemble of QSAR models may serve as
a DA metric and can outperform distance-to-model metrics. Intriguingly,
the performance of ensemble variance metric has led researchers to
conclude that the error of predicting a new molecule does not depend
on the input descriptors or machine-learning methods but on its distance
to the training molecules. This implies that the distance to training
samples may serve as the basis for developing a high-performance DA
metric. In this article, we introduce a new Tanimoto distance-based
DA metric called the sum of distance-weighted contributions (SDC),
which takes into account contributions from all molecules in a training
set. Using four acute chemical toxicity data sets of varying sizes
and four other molecular property data sets, we demonstrate that SDC
correlates well with the prediction error for all data sets regardless
of the machine-learning methods and molecular descriptors used to
build the QSAR models. Using the acute toxicity data sets, we compared
the distribution of prediction errors with respect to SDC, the mean
distance to <i>k</i>-nearest training samples, and the variance
of random forest predictions. The results showed that the correlation
with the prediction error was highest for SDC. We also demonstrate
that SDC allows for the development of robust root mean squared error
(RMSE) models and makes it possible to not only give a QSAR prediction
but also provide an individual RMSE estimate for each molecule. Because
SDC does not depend on a specific machine-learning method, it represents
a canonical measure that can be widely used to estimate individual
molecule prediction errors for any machine-learning method
Additional file 1: of Correlation of increased corrected TIMI frame counts and the topographical extent of isolated coronary artery ectasia
Table S1. Correlation Models of CTFCindex and Topological Parameters. (DOC 64 kb
Additional file 1: of Data-driven prediction of adverse drug reactions induced by drug-drug interactions
Figure S1. and Tables S1. and S2. (DOCX 7762 kb