1,931 research outputs found
An Interpretable Stroke Prediction Model using Rules and Bayesian Analysis
We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (for example, if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily inter-
pretable decision statements. We introduce a generative model called the Bayesian List Machine which yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that the Bayesian List Machine has predictive accuracy on par with the current top algorithms
for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS2 score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial brillation. Our model is as interpretable as CHADS2, but more accurate
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if … then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS₂ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS₂, but more accurate.National Science Foundation (U.S.) (Grant IIS-1053407
Preterm Birth Prediction: Deriving Stable and Interpretable Rules from High Dimensional Data
Preterm births occur at an alarming rate of 10-15%. Preemies have a higher
risk of infant mortality, developmental retardation and long-term disabilities.
Predicting preterm birth is difficult, even for the most experienced
clinicians. The most well-designed clinical study thus far reaches a modest
sensitivity of 18.2-24.2% at specificity of 28.6-33.3%. We take a different
approach by exploiting databases of normal hospital operations. We aims are
twofold: (i) to derive an easy-to-use, interpretable prediction rule with
quantified uncertainties, and (ii) to construct accurate classifiers for
preterm birth prediction. Our approach is to automatically generate and select
from hundreds (if not thousands) of possible predictors using stability-aware
techniques. Derived from a large database of 15,814 women, our simplified
prediction rule with only 10 items has sensitivity of 62.3% at specificity of
81.5%.Comment: Presented at 2016 Machine Learning and Healthcare Conference (MLHC
2016), Los Angeles, C
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
- …