9 research outputs found
Computational methods for prediction of in vitro effects of new chemical structures
Background With a constant increase in the number of new chemicals synthesized
every year, it becomes important to employ the most reliable and fast in
silico screening methods to predict their safety and activity profiles. In
recent years, in silico prediction methods received great attention in an
attempt to reduce animal experiments for the evaluation of various
toxicological endpoints, complementing the theme of replace, reduce and
refine. Various computational approaches have been proposed for the prediction
of compound toxicity ranging from quantitative structure activity relationship
modeling to molecular similarity-based methods and machine learning. Within
the “Toxicology in the 21st Century” screening initiative, a crowd-sourcing
platform was established for the development and validation of computational
models to predict the interference of chemical compounds with nuclear receptor
and stress response pathways based on a training set containing more than
10,000 compounds tested in high-throughput screening assays. Results Here, we
present the results of various molecular similarity-based and machine-learning
based methods over an independent evaluation set containing 647 compounds as
provided by the Tox21 Data Challenge 2014. It was observed that the Random
Forest approach based on MACCS molecular fingerprints and a subset of 13
molecular descriptors selected based on statistical and literature analysis
performed best in terms of the area under the receiver operating
characteristic curve values. Further, we compared the individual and combined
performance of different methods. In retrospect, we also discuss the reasons
behind the superior performance of an ensemble approach, combining a
similarity search method with the Random Forest algorithm, compared to
individual methods while explaining the intrinsic limitations of the latter.
Conclusions Our results suggest that, although prediction methods were
optimized individually for each modelled target, an ensemble of similarity and
machine-learning approaches provides promising performance indicating its
broad applicability in toxicity prediction
The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data
Drug-induced inhibition of the human
ether-à-go-go-related
gene (hERG)-encoded potassium ion channels can lead to fatal cardiotoxicity.
Several marketed drugs and promising drug candidates were recalled
because of this concern. Diverse modeling methods ranging from molecular
similarity assessment to quantitative structure–activity relationship
analysis employing machine learning techniques have been applied to
data sets of varying size and composition (number of blockers and
nonblockers). In this study, we highlight the challenges involved
in the development of a robust classifier for predicting the hERG
end point using bioactivity data extracted from the public domain.
To this end, three different modeling methods, nearest neighbors,
random forests, and support vector machines, were employed to develop
predictive models using different molecular descriptors, activity
thresholds, and training set compositions. Our models demonstrated
superior performance in external validations in comparison with those
reported in the previous studies from which the data sets were extracted.
The choice of descriptors had little influence on the model performance,
with minor exceptions. The criteria used to filter bioactivity data,
the activity threshold settings used to separate blockers from nonblockers,
and the structural diversity of blockers in training data set were
found to be the crucial indicators of model performance. Training
sets based on a binary threshold of 1 ÎĽM/10 ÎĽM to separate
blockers (IC<sub>50</sub>/<i>K</i><sub>i</sub> ≤
1 ÎĽM) from nonblockers (IC<sub>50</sub>/<i>K</i><sub>i</sub> > 10 ÎĽM) provided superior performance in comparison
with those defined using a single threshold (1 ÎĽM or 10 ÎĽM).
A major limitation in using the public domain hERG activity data is
the abundance of blockers in comparison with nonblockers at usual
activity thresholds, since not many studies report the latter
TCRD and Pharos 2021:mining the human proteome for disease biology
In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome
Pharos 2023: an integrated resource for the understudied human proteome
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts