Interpretable, Probability-Based
Confidence Metric
for Continuous Quantitative Structure–Activity Relationship
Models
- Publication date
- Publisher
Abstract
A great deal of research has gone into the development
of robust
confidence in prediction and applicability domain (AD) measures for
quantitative structure–activity relationship (QSAR) models
in recent years. Much of the attention has historically focused on
structural similarity, which can be defined in many forms and flavors.
A concept that is frequently overlooked in the realm of the QSAR applicability
domain is how the local activity landscape plays a role in how accurate
a prediction is or is not. In this work, we describe an approach that
pairs information about both the chemical similarity and activity
landscape of a test compound’s neighborhood into a single calculated
confidence value. We also present an approach for converting this
value into an interpretable confidence metric that has a simple and
informative meaning across data sets. The approach will be introduced
to the reader in the context of models built upon four diverse literature
data sets. The steps we will outline include the definition of similarity
used to determine nearest neighbors (NN), how we incorporate the NN
activity landscape with a similarity-weighted root-mean-square distance
(wRMSD) value, and how that value is then calibrated to generate an
intuitive confidence metric for prospective application. Finally,
we will illustrate the prospective performance of the approach on
five proprietary models whose predictions and confidence metrics have
been tracked for more than a year