3 research outputs found
KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development
Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process
Generalized Workflow for Generating Highly Predictive in Silico Off‑Target Activity Models
Chemical structure data and corresponding
measured bioactivities
of compounds are nowadays easily available from public and commercial
databases. However, these databases contain heterogeneous data from
different laboratories determined under different protocols and, in
addition, sometimes even erroneous entries. In this study, we evaluated
the use of data from bioactivity databases for the generation of high
quality in silico models for off-target mediated toxicity as a decision
support in early drug discovery and crop-protection research. We chose
human acetylcholinesterase (hAChE) inhibition as an exemplary end
point for our case study. A standardized and thorough quality management
routine for input data consisting of more than 2,200 chemical entities
from bioactivity databases was established. This procedure finally
enables the development of predictive QSAR models based on heterogeneous
in vitro data from multiple laboratories. An extended applicability
domain approach was used, and regression results were refined by an
error estimation routine. Subsequent classification augmented by special
consideration of borderline candidates leads to high accuracies in
external validation achieving correct predictive classification of
96%. The standardized process described herein is implemented as a
(semi)Âautomated workflow and thus easily transferable to other off-targets
and assay readouts