14 research outputs found
Supplementary_File_13.zip
Supplemental files 13-15 for "Computer-Aided Discovery of Novel Ebola Virus Inhibitors" manuscript.<br>These files contain all
QSAR models from Chembench, HiT QSAR, and GUSAR<br
Phantom PAINS: Problems with the Utility of Alerts for Pan‑Assay INterference CompoundS
The
use of substructural alerts to identify <b>P</b>an-<b>A</b>ssay <b>IN</b>terference compound<b>S</b> (PAINS)
has become a common component of the triage process in biological
screening campaigns. These alerts, however, were originally derived
from a proprietary library tested in just six assays measuring protein–protein
interaction (PPI) inhibition using the AlphaScreen detection technology
only; moreover, 68% (328 out of the 480 alerts) were derived from
four or fewer compounds. In an effort to assess the reliability of
these alerts as indicators of pan-assay interference, we performed
a large-scale analysis of the impact of PAINS alerts on compound promiscuity
in bioassays using publicly available data in PubChem. We found that
the majority (97%) of all compounds containing PAINS alerts were actually
infrequent hitters in AlphaScreen assays measuring PPI inhibition.
We also found that the presence of PAINS alerts, contrary to expectations,
did not reflect any heightened assay activity trends across all assays
in PubChem including AlphaScreen, luciferase, beta-lactamase, or fluorescence-based
assays. In addition, 109 PAINS alerts were present in 3570 extensively
assayed, but consistently inactive compounds called Dark Chemical
Matter. Finally, we observed that 87 small molecule FDA-approved drugs
contained PAINS alerts and profiled their bioassay activity. Based
on this detailed analysis of PAINS alerts in nonproprietary compound
libraries, we caution against the blind use of PAINS filters to detect
and triage compounds with possible PAINS liabilities and recommend
that such conclusions should be drawn only by conducting orthogonal
experiments
CHEMINFORMATICS: AN INTRODUCTION
<p></p><p>Cheminformatics is an interdisciplinary field between chemistry and informatics, which has evolved considerably since its inception in the 1960s. Initially, the cheminformatics community dealt primarily with practical and technical aspects of chemical structure representation, manipulation, and processing, while modern research explores a new role: the exploration and interpretation of large chemical databases and the discovery of new compounds with desired activity and safety profiles. Despite the recent release of several hallmark reviews addressing methods and application of cheminformatics written in Portuguese, so far there are no scientific articles presenting cheminformatics research to the Brazilian scientific community yet. To address this gap, we aim to introduce the field of cheminformatics to both students and researchers in a simple and didactic way by narrating important historical facts and contextualizing information within the scope of various applications.</p><p></p
Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling?
Prior to using a quantitative structure activity relationship
(QSAR)
model for external predictions, its predictive power should be established
and validated. In the absence of a true external data set, the best
way to validate the predictive ability of a model is to perform its
statistical external validation. In statistical external validation,
the overall data set is divided into training and test sets. Commonly,
this splitting is performed using random division. Rational splitting
methods can divide data sets into training and test sets in an intelligent
fashion. The purpose of this study was to determine whether rational
division methods lead to more predictive models compared to random
division. A special data splitting procedure was used to facilitate
the comparison between random and rational division methods. For each
toxicity end point, the overall data set was divided into a modeling
set (80% of the overall set) and an external evaluation set (20% of
the overall set) using random division. The modeling set was then
subdivided into a training set (80% of the modeling set) and a test
set (20% of the modeling set) using rational division methods and
by using random division. The Kennard-Stone, minimal test set dissimilarity,
and sphere exclusion algorithms were used as the rational division
methods. The hierarchical clustering, random forest, and <i>k</i>-nearest neighbor (<i>k</i>NN) methods were used to develop
QSAR models based on the training sets. For <i>k</i>NN QSAR,
multiple training and test sets were generated, and multiple QSAR
models were built. The results of this study indicate that models
based on rational division methods generate better statistical results
for the test sets than models based on random division, but the predictive
power of both types of models are comparable
Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of Chemicals
Chemically
induced skin sensitization is a complex immunological
disease with a profound impact on quality of life and working ability.
Despite some progress in developing alternative methods for assessing
the skin sensitization potential of chemical substances, there is
no in vitro test that correlates well with human data. Computational
QSAR models provide a rapid screening approach and contribute valuable
information for the assessment of chemical toxicity. We describe the
development of a freely accessible web-based and mobile application
for the identification of potential skin sensitizers. The application
is based on previously developed binary QSAR models of skin sensitization
potential from human (109 compounds) and murine local lymph node assay
(LLNA, 515 compounds) data with good external correct classification
rate (0.70–0.81 and 0.72–0.84, respectively). We also
included a multiclass skin sensitization potency model based on LLNA
data (accuracy ranging between 0.73 and 0.76). When a user evaluates
a compound in the web app, the outputs are (i) binary predictions
of human and murine skin sensitization potential; (ii) multiclass
prediction of murine skin sensitization; and (iii) probability maps
illustrating the predicted contribution of chemical fragments. The
app is the first tool available that incorporates quantitative structure–activity
relationship (QSAR) models based on human data as well as multiclass
models for LLNA. The Pred-Skin web app version 1.0 is freely available
for the web, iOS, and Android (in development) at the LabMol web portal
(http://labmol.com.br/predskin/), in the Apple Store, and on Google Play, respectively. We will
continuously update the app as new skin sensitization data and respective
models become available
Chembench: A Publicly Accessible, Integrated Cheminformatics Portal
The
enormous increase in the amount of publicly available chemical
genomics data and the growing emphasis on data sharing and open science
mandates that cheminformaticians also make their models publicly available
for broad use by the scientific community. Chembench is one of the
first publicly accessible, integrated cheminformatics Web portals.
It has been extensively used by researchers from different fields
for curation, visualization, analysis, and modeling of chemogenomics
data. Since its launch in 2008, Chembench has been accessed more than
1 million times by more than 5000 users from a total of 98 countries.
We report on the recent updates and improvements that increase the
simplicity of use, computational efficiency, accuracy, and accessibility
of a broad range of tools and services for computer-assisted drug
design and computational toxicology available on Chembench. Chembench
remains freely accessible at https://chembench.mml.unc.ed
Chemotext: A Publicly Available Web Server for Mining Drug–Target–Disease Relationships in PubMed
Elucidation of the
mechanistic relationships between drugs, their
targets, and diseases is at the core of modern drug discovery research.
Thousands of studies relevant to the drug–target–disease
(DTD) triangle have been published and annotated in the Medline/PubMed
database. Mining this database affords rapid identification of all
published studies that confirm connections between vertices of this
triangle or enable new inferences of such connections. To this end,
we describe the development of Chemotext, a publicly available Web
server that mines the entire compendium of published literature in
PubMed annotated by Medline Subject Heading (MeSH) terms. The goal
of Chemotext is to identify all known DTD relationships and infer
missing links between vertices of the DTD triangle. As a proof-of-concept,
we show that Chemotext could be instrumental in generating new drug
repurposing hypotheses or annotating clinical outcomes pathways for
known drugs. The Chemotext Web server is freely available at http://chemotext.mml.unc.edu
Multi-Descriptor Read Across (MuDRA): A Simple and Transparent Approach for Developing Accurate Quantitative Structure–Activity Relationship Models
Multiple
approaches to quantitative structure–activity relationship
(QSAR) modeling using various statistical or machine learning techniques
and different types of chemical descriptors have been developed over
the years. Oftentimes models are used in consensus to make more accurate
predictions at the expense of model interpretation. We propose a simple,
fast, and reliable method termed Multi-Descriptor Read Across (MuDRA)
for developing both accurate and interpretable models. The method
is conceptually related to the well-known kNN approach but uses different
types of chemical descriptors simultaneously for similarity assessment.
To benchmark the new method, we have built MuDRA models for six different
end points (Ames mutagenicity, aquatic toxicity, hepatotoxicity, hERG
liability, skin sensitization, and endocrine disruption) and compared
the results with those generated with conventional consensus QSAR
modeling. We find that models built with MuDRA show consistently high
external accuracy similar to that of conventional QSAR models. However,
MuDRA models excel in terms of transparency, interpretability, and
computational efficiency. We posit that due to its methodological
simplicity and reliable predictive accuracy, MuDRA provides a powerful
alternative to a much more complex consensus QSAR modeling. MuDRA
is implemented and freely available at the Chembench web portal (https://chembench.mml.unc.edu/mudra<i>)</i>
QSAR Modeling and Prediction of Drug–Drug Interactions
Severe
adverse drug reactions (ADRs) are the fourth leading cause
of fatality in the U.S. with more than 100 000 deaths per year.
As up to 30% of all ADRs are believed to be caused by drug–drug
interactions (DDIs), typically mediated by cytochrome P450s, possibilities
to predict DDIs from existing knowledge are important. We collected
data from public sources on 1485, 2628, 4371, and 27 966 possible
DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, and
3A4 for 55, 73, 94, and 237 drugs, respectively. For each of these
data sets, we developed and validated QSAR models for the prediction
of DDIs. As a unique feature of our approach, the interacting drug
pairs were represented as binary chemical mixtures in a 1:1 ratio.
We used two types of chemical descriptors: quantitative neighborhoods
of atoms (QNA) and simplex descriptors. Radial basis functions with
self-consistent regression (RBF-SCR) and random forest (RF) were utilized
to build QSAR models predicting the likelihood of DDIs for any pair
of drug molecules. Our models showed balanced accuracy of 72–79%
for the external test sets with a coverage of 81.36–100% when
a conservative threshold for the model’s applicability domain
was applied. We generated virtually all possible binary combinations
of marketed drugs and employed our models to identify drug pairs predicted
to be instances of DDI. More than 4500 of these predicted DDIs that
were not found in our training sets were confirmed by data from the
DrugBank database