1 research outputs found
The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data
Drug-induced inhibition of the human
ether-à-go-go-related
gene (hERG)-encoded potassium ion channels can lead to fatal cardiotoxicity.
Several marketed drugs and promising drug candidates were recalled
because of this concern. Diverse modeling methods ranging from molecular
similarity assessment to quantitative structure–activity relationship
analysis employing machine learning techniques have been applied to
data sets of varying size and composition (number of blockers and
nonblockers). In this study, we highlight the challenges involved
in the development of a robust classifier for predicting the hERG
end point using bioactivity data extracted from the public domain.
To this end, three different modeling methods, nearest neighbors,
random forests, and support vector machines, were employed to develop
predictive models using different molecular descriptors, activity
thresholds, and training set compositions. Our models demonstrated
superior performance in external validations in comparison with those
reported in the previous studies from which the data sets were extracted.
The choice of descriptors had little influence on the model performance,
with minor exceptions. The criteria used to filter bioactivity data,
the activity threshold settings used to separate blockers from nonblockers,
and the structural diversity of blockers in training data set were
found to be the crucial indicators of model performance. Training
sets based on a binary threshold of 1 μM/10 μM to separate
blockers (IC<sub>50</sub>/<i>K</i><sub>i</sub> ≤
1 μM) from nonblockers (IC<sub>50</sub>/<i>K</i><sub>i</sub> > 10 μM) provided superior performance in comparison
with those defined using a single threshold (1 μM or 10 μM).
A major limitation in using the public domain hERG activity data is
the abundance of blockers in comparison with nonblockers at usual
activity thresholds, since not many studies report the latter