24,555 research outputs found
Optimal Sparse Decision Trees
Decision tree algorithms have been among the most popular algorithms for
interpretable (transparent) machine learning since the early 1980's. The
problem that has plagued decision tree algorithms since their inception is
their lack of optimality, or lack of guarantees of closeness to optimality:
decision tree algorithms are often greedy or myopic, and sometimes produce
unquestionably suboptimal models. Hardness of decision tree optimization is
both a theoretical and practical obstacle, and even careful mathematical
programming approaches have not been able to solve these problems efficiently.
This work introduces the first practical algorithm for optimal decision trees
for binary variables. The algorithm is a co-design of analytical bounds that
reduce the search space and modern systems techniques, including data
structures and a custom bit-vector library. Our experiments highlight
advantages in scalability, speed, and proof of optimality.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
Discovering the Interpretability-Performance Pareto Front of Decision Trees with Dynamic Programming
Decision trees are known to be intrinsically interpretable as they can be
inspected and interpreted by humans. Furthermore, recent hardware advances have
rekindled an interest for optimal decision tree algorithms, that produce more
accurate trees than the usual greedy approaches. However, these optimal
algorithms return a single tree optimizing a hand defined
interpretability-performance trade-off, obtained by specifying a maximum number
of decision nodes, giving no further insights about the quality of this
trade-off. In this paper, we propose a new Markov Decision Problem (MDP)
formulation for finding optimal decision trees. The main interest of this
formulation is that we can compute the optimal decision trees for several
interpretability-performance trade-offs by solving a single dynamic program,
letting the user choose a posteriori the tree that best suits their needs.
Empirically, we show that our method is competitive with state-of-the-art
algorithms in terms of accuracy and runtime while returning a whole set of
trees on the interpretability-performance Pareto front
SAT-Based Approach for Learning Optimal Decision Trees with Non-Binary Features
Decision trees are a popular classification model in machine learning due to their interpretability and performance. Traditionally, decision-tree classifiers are constructed using greedy heuristic algorithms, however these algorithms do not provide guarantees on the quality of the resultant trees. Instead, a recent line of work has studied the use of exact optimization approaches for constructing optimal decision trees. Most of the recent approaches that employ exact optimization are designed for datasets with binary features. While numeric and categorical features can be transformed to binary features, this transformation can introduce a large number of binary features and may not be efficient in practice. In this work, we present a novel SAT-based encoding for decision trees that supports non-binary features and demonstrate how it can be used to solve two well-studied variants of the optimal decision tree problem. We perform an extensive empirical analysis that shows our approach obtains superior performance and is often an order of magnitude faster than the current state-of-the-art exact techniques on non-binary datasets
TreeGrad: Transferring Tree Ensembles to Neural Networks
Gradient Boosting Decision Tree (GBDT) are popular machine learning
algorithms with implementations such as LightGBM and in popular machine
learning toolkits like Scikit-Learn. Many implementations can only produce
trees in an offline manner and in a greedy manner. We explore ways to convert
existing GBDT implementations to known neural network architectures with
minimal performance loss in order to allow decision splits to be updated in an
online manner and provide extensions to allow splits points to be altered as a
neural architecture search problem. We provide learning bounds for our neural
network.Comment: Technical Report on Implementation of Deep Neural Decision Forests
Algorithm. To accompany implementation here:
https://github.com/chappers/TreeGrad. Update: Please cite as: Siu, C. (2019).
"Transferring Tree Ensembles to Neural Networks". International Conference on
Neural Information Processing. Springer, 2019. arXiv admin note: text overlap
with arXiv:1909.1179
Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover
Stochastic Boolean Function Evaluation is the problem of determining the
value of a given Boolean function f on an unknown input x, when each bit of x_i
of x can only be determined by paying an associated cost c_i. The assumption is
that x is drawn from a given product distribution, and the goal is to minimize
the expected cost. This problem has been studied in Operations Research, where
it is known as "sequential testing" of Boolean functions. It has also been
studied in learning theory in the context of learning with attribute costs. We
consider the general problem of developing approximation algorithms for
Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for
evaluating Boolean linear threshold formulas. We also present an approximation
algorithm for evaluating CDNF formulas (and decision trees) achieving a factor
of O(log kd), where k is the number of terms in the DNF formula, and d is the
number of clauses in the CNF formula. In addition, we present approximation
algorithms for simultaneous evaluation of linear threshold functions, and for
ranking of linear functions.
Our function evaluation algorithms are based on reductions to the Stochastic
Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and
Krause. They presented an approximation algorithm for the problem, called
Adaptive Greedy. Our main technical contribution is a new approximation
algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an
extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito,
which is a generalization of Hochbaum's algorithm for the classical Set Cover
Problem. We also give a new bound on the approximation achieved by the Adaptive
Greedy algorithm of Golovin and Krause
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
- …