Interpretable, Probability-Based Confidence Metric for Continuous Quantitative Structure–Activity Relationship Models

Abstract

A great deal of research has gone into the development of robust confidence in prediction and applicability domain (AD) measures for quantitative structure–activity relationship (QSAR) models in recent years. Much of the attention has historically focused on structural similarity, which can be defined in many forms and flavors. A concept that is frequently overlooked in the realm of the QSAR applicability domain is how the local activity landscape plays a role in how accurate a prediction is or is not. In this work, we describe an approach that pairs information about both the chemical similarity and activity landscape of a test compound’s neighborhood into a single calculated confidence value. We also present an approach for converting this value into an interpretable confidence metric that has a simple and informative meaning across data sets. The approach will be introduced to the reader in the context of models built upon four diverse literature data sets. The steps we will outline include the definition of similarity used to determine nearest neighbors (NN), how we incorporate the NN activity landscape with a similarity-weighted root-mean-square distance (wRMSD) value, and how that value is then calibrated to generate an intuitive confidence metric for prospective application. Finally, we will illustrate the prospective performance of the approach on five proprietary models whose predictions and confidence metrics have been tracked for more than a year

    Similar works

    Full text

    thumbnail-image

    Available Versions