130,438 research outputs found
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Data mining: a tool for detecting cyclical disturbances in supply networks.
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means
clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause
Rough sets theory for travel demand analysis in Malaysia
This study integrates the rough sets theory into tourism demand analysis. Originated from the area of Artificial Intelligence, the rough sets theory was introduced to disclose important structures and to classify objects. The Rough Sets methodology provides definitions and methods for finding which attributes separates one class or classification from another. Based on this theory can propose a formal framework for the automated transformation of data into knowledge. This makes the rough sets approach a useful classification and pattern recognition technique. This study introduces a new rough sets approach for deriving rules from information table of tourist in Malaysia. The induced rules were able to forecast change in demand with certain accuracy
- …