750 research outputs found
Robust Decision Trees Against Adversarial Examples
Although adversarial examples and model robustness have been extensively
studied in the context of linear models and neural networks, research on this
issue in tree-based models and how to make tree-based models robust against
adversarial examples is still limited. In this paper, we show that tree based
models are also vulnerable to adversarial examples and develop a novel
algorithm to learn robust trees. At its core, our method aims to optimize the
performance under the worst-case perturbation of input features, which leads to
a max-min saddle point problem. Incorporating this saddle point objective into
the decision tree building procedure is non-trivial due to the discrete nature
of trees --- a naive approach to finding the best split according to this
saddle point objective will take exponential time. To make our approach
practical and scalable, we propose efficient tree building algorithms by
approximating the inner minimizer in this saddle point problem, and present
efficient implementations for classical information gain based trees as well as
state-of-the-art tree boosting models such as XGBoost. Experimental results on
real world datasets demonstrate that the proposed algorithms can substantially
improve the robustness of tree-based models against adversarial examples
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
- …