16,305 research outputs found

    SAT-Based Approach for Learning Optimal Decision Trees with Non-Binary Features

    Get PDF
    Decision trees are a popular classification model in machine learning due to their interpretability and performance. Traditionally, decision-tree classifiers are constructed using greedy heuristic algorithms, however these algorithms do not provide guarantees on the quality of the resultant trees. Instead, a recent line of work has studied the use of exact optimization approaches for constructing optimal decision trees. Most of the recent approaches that employ exact optimization are designed for datasets with binary features. While numeric and categorical features can be transformed to binary features, this transformation can introduce a large number of binary features and may not be efficient in practice. In this work, we present a novel SAT-based encoding for decision trees that supports non-binary features and demonstrate how it can be used to solve two well-studied variants of the optimal decision tree problem. We perform an extensive empirical analysis that shows our approach obtains superior performance and is often an order of magnitude faster than the current state-of-the-art exact techniques on non-binary datasets

    Submodularity and Optimality of Fusion Rules in Balanced Binary Relay Trees

    Full text link
    We study the distributed detection problem in a balanced binary relay tree, where the leaves of the tree are sensors generating binary messages. The root of the tree is a fusion center that makes the overall decision. Every other node in the tree is a fusion node that fuses two binary messages from its child nodes into a new binary message and sends it to the parent node at the next level. We assume that the fusion nodes at the same level use the same fusion rule. We call a string of fusion rules used at different levels a fusion strategy. We consider the problem of finding a fusion strategy that maximizes the reduction in the total error probability between the sensors and the fusion center. We formulate this problem as a deterministic dynamic program and express the solution in terms of Bellman's equations. We introduce the notion of stringsubmodularity and show that the reduction in the total error probability is a stringsubmodular function. Consequentially, we show that the greedy strategy, which only maximizes the level-wise reduction in the total error probability, is within a factor of the optimal strategy in terms of reduction in the total error probability

    Robust Decision Trees Against Adversarial Examples

    Full text link
    Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worst-case perturbation of input features, which leads to a max-min saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is non-trivial due to the discrete nature of trees --- a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples
    • …
    corecore