13,790 research outputs found
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules
The need to prediscretize numeric attributes before they can be used in
association rule learning is a source of inefficiencies in the resulting
classifier. This paper describes several new rule tuning steps aiming to
recover information lost in the discretization of numeric (quantitative)
attributes, and a new rule pruning strategy, which further reduces the size of
the classification models. We demonstrate the effectiveness of the proposed
methods on postoptimization of models generated by three state-of-the-art
association rule classification algorithms: Classification based on
Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016),
and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from
the UCI repository show that the postoptimized models are consistently smaller
-- typically by about 50% -- and have better classification performance on most
datasets
Mining data streams using option trees (revised edition, 2004)
The data stream model for data mining places harsh restrictions on a learning algorithm. A model must be induced following the briefest interrogation of the data, must use only available memory and must update itself over time within these constraints. Additionally, the model must be able to be used for data mining at any point in time.
This paper describes a data stream classi_cation algorithm using an ensemble of option trees. The ensemble of trees is induced by boosting and iteratively combined into a single interpretable model. The algorithm is evaluated using benchmark datasets for accuracy against state-of-the-art algorithms that make use of the entire dataset
- …