High
Throughput Screening (HTS) is a common approach in life sciences
to discover chemical matter that modulates a biological target or
phenotype. However, low assay throughput, reagents cost, or a flowchart
that can deal with only a limited number of hits may impair screening
large numbers of compounds. In this case, a subset of compounds is
assayed, and <i>in silico</i> models are utilized to aid
in iterative screening design, usually to expand around the found
hits and enrich subsequent rounds for relevant chemical matter. However,
this may lead to an overly narrow focus, and the diversity of compounds
sampled in subsequent iterations may suffer. Active learning has been
recently successfully applied in drug discovery with the goal of sampling
diverse chemical space to improve model performance. Here we introduce
a robust and straightforward iterative screening protocol based on
naı̈ve Bayes models. Instead of following up on the compounds
with the highest scores in the <i>in silico</i> model, we
pursue compounds with very low but positive values. This includes
unique chemotypes of weakly active compounds that enhance the applicability
domain of the model and increase the cumulative hit rates. We show
in a retrospective application to 81 Novartis assays that this protocol
leads to consistently higher compound and scaffold hit rates compared
to a standard expansion around hits or an active learning approach.
We recommend using the weak reinforcement strategy introduced herein
for iterative screening workflows