4 research outputs found
Scalable Bayesian Rule Lists
We present an algorithm for building probabilistic rule lists that is two
orders of magnitude faster than previous work. Rule list algorithms are
competitors for decision tree algorithms. They are associative classifiers, in
that they are built from pre-mined association rules. They have a logical
structure that is a sequence of IF-THEN rules, identical to a decision list or
one-sided decision tree. Instead of using greedy splitting and pruning like
decision tree algorithms, we fully optimize over rule lists, striking a
practical balance between accuracy, interpretability, and computational speed.
The algorithm presented here uses a mixture of theoretical bounds (tight enough
to have practical implications as a screening or bounding procedure),
computational reuse, and highly tuned language libraries to achieve
computational efficiency. Currently, for many practical problems, this method
achieves better accuracy and sparsity than decision trees; further, in many
cases, the computational time is practical and often less than that of decision
trees. The result is a probabilistic classifier (which estimates P(y = 1|x) for
each x) that optimizes the posterior of a Bayesian hierarchical model over rule
lists.Comment: 31 pages, 19 figure
Learning Certifiably Optimal Rule Lists for Categorical Data
We present the design and implementation of a custom discrete optimization
technique for building rule lists over a categorical feature space. Our
algorithm produces rule lists with optimal training performance, according to
the regularized empirical risk, with a certificate of optimality. By leveraging
algorithmic bounds, efficient data structures, and computational reuse, we
achieve several orders of magnitude speedup in time and a massive reduction of
memory consumption. We demonstrate that our approach produces optimal rule
lists on practical problems in seconds. Our results indicate that it is
possible to construct optimal sparse rule lists that are approximately as
accurate as the COMPAS proprietary risk prediction tool on data from Broward
County, Florida, but that are completely interpretable. This framework is a
novel alternative to CART and other decision tree methods for interpretable
modeling.Comment: A short version of this work appeared in KDD '17 as "Learning
Certifiably Optimal Rule Lists
A Fast Way to Produce Optimal Fixed-Depth Decision Trees
Decision trees play an essential role in many classification tasks. In some circumstances, we only want to consider fixeddepth trees. Unfortunately, finding the optimal depth-d decision tree can require time exponential in d. This paper presents a fast way to produce a fixed-depth decision tree that is optimal under the Naïve Bayes (NB) assumption. Here, we prove that the optimal depth-d feature essentially depends only on the posterior probability of the class label given the tests previously performed, but not on either the identity nor the outcomes of these tests. We can therefore precompute, in a fast pre-processing step, which features to use at the final layer. This results in a speedup of O(n / log n), where n is the number of features. We apply this technique to learning fixed-depth decision trees from standard datasets from the UCI repository, and find this model improves the computational cost significantly. Surprisingly, this approach still yields relatively high classification accuracy, despite the NB assumption.
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Interpretability in machine learning (ML) is crucial for high stakes
decisions and troubleshooting. In this work, we provide fundamental principles
for interpretable ML, and dispel common misunderstandings that dilute the
importance of this crucial topic. We also identify 10 technical challenge areas
in interpretable machine learning and provide history and background on each
problem. Some of these problems are classically important, and some are recent
problems that have arisen in the last few years. These problems are: (1)
Optimizing sparse logical models such as decision trees; (2) Optimization of
scoring systems; (3) Placing constraints into generalized additive models to
encourage sparsity and better interpretability; (4) Modern case-based
reasoning, including neural networks and matching for causal inference; (5)
Complete supervised disentanglement of neural networks; (6) Complete or even
partial unsupervised disentanglement of neural networks; (7) Dimensionality
reduction for data visualization; (8) Machine learning models that can
incorporate physics and other generative or causal constraints; (9)
Characterization of the "Rashomon set" of good models; and (10) Interpretable
reinforcement learning. This survey is suitable as a starting point for
statisticians and computer scientists interested in working in interpretable
machine learning