Machine Learning Model Analysis and Data Visualization
with Small Molecules Tested in a Mouse Model of Mycobacterium
tuberculosis Infection (2014–2015)
The
renewed urgency to develop new treatments for Mycobacterium
tuberculosis (<i>Mtb</i>)
infection has resulted in large-scale phenotypic screening and thousands
of new active compounds <i>in vitro</i>. The next challenge
is to identify candidates to pursue in a mouse <i>in vivo</i> efficacy model as a step to predicting clinical efficacy. We previously
analyzed over 70 years of this mouse <i>in vivo</i> efficacy
data, which we used to generate and validate machine learning models.
Curation of 60 additional small molecules with <i>in vivo</i> data published in 2014 and 2015 was undertaken to further test these
models. This represents a much larger test set than for the previous
models. Several computational approaches have now been applied to
analyze these molecules and compare their molecular properties beyond
those attempted previously. Our previous machine learning models have
been updated, and a novel aspect has been added in the form of mouse
liver microsomal half-life (MLM <i>t</i><sub>1/2</sub>)
and <i>in vitro</i>-based <i>Mtb</i> models incorporating
cytotoxicity data that were used to predict <i>in vivo</i> activity for comparison. Our best <i>Mtb</i> <i>in
vivo</i> models possess fivefold ROC values > 0.7, sensitivity
> 80%, and concordance > 60%, while the best specificity value
is
>40%. Use of an MLM <i>t</i><sub>1/2</sub> Bayesian model
affords comparable results for scoring the 60 compounds tested. Combining
MLM stability and <i>in vitro</i> <i>Mtb</i> models
in a novel consensus workflow in the best cases has a positive predicted
value (hit rate) > 77%. Our results indicate that Bayesian models
constructed with literature <i>in vivo</i> <i>Mtb</i> data generated by different laboratories in various mouse models
can have predictive value and may be used alongside MLM <i>t</i><sub>1/2</sub> and <i>in vitro</i>-based <i>Mtb</i> models to assist in selecting antitubercular compounds with desirable <i>in vivo</i> efficacy. We demonstrate for the first time that
consensus models of any kind can be used to predict <i>in vivo</i> activity for <i>Mtb</i>. In addition, we describe a new
clustering method for data visualization and apply this to the <i>in vivo</i> training and test data, ultimately making the method
accessible in a mobile app