5,696 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Robot Introspection with Bayesian Nonparametric Vector Autoregressive Hidden Markov Models
Robot introspection, as opposed to anomaly detection typical in process
monitoring, helps a robot understand what it is doing at all times. A robot
should be able to identify its actions not only when failure or novelty occurs,
but also as it executes any number of sub-tasks. As robots continue their quest
of functioning in unstructured environments, it is imperative they understand
what is it that they are actually doing to render them more robust. This work
investigates the modeling ability of Bayesian nonparametric techniques on
Markov Switching Process to learn complex dynamics typical in robot contact
tasks. We study whether the Markov switching process, together with Bayesian
priors can outperform the modeling ability of its counterparts: an HMM with
Bayesian priors and without. The work was tested in a snap assembly task
characterized by high elastic forces. The task consists of an insertion subtask
with very complex dynamics. Our approach showed a stronger ability to
generalize and was able to better model the subtask with complex dynamics in a
computationally efficient way. The modeling technique is also used to learn a
growing library of robot skills, one that when integrated with low-level
control allows for robot online decision making.Comment: final version submitted to humanoids 201
Multi-criteria Anomaly Detection using Pareto Depth Analysis
We consider the problem of identifying patterns in a data set that exhibit
anomalous behavior, often referred to as anomaly detection. In most anomaly
detection algorithms, the dissimilarity between data samples is calculated by a
single criterion, such as Euclidean distance. However, in many cases there may
not exist a single dissimilarity measure that captures all possible anomalous
patterns. In such a case, multiple criteria can be defined, and one can test
for anomalies by scalarizing the multiple criteria using a linear combination
of them. If the importance of the different criteria are not known in advance,
the algorithm may need to be executed multiple times with different choices of
weights in the linear combination. In this paper, we introduce a novel
non-parametric multi-criteria anomaly detection method using Pareto depth
analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies
under multiple criteria without having to run an algorithm multiple times with
different choices of weights. The proposed PDA approach scales linearly in the
number of criteria and is provably better than linear combinations of the
criteria.Comment: Removed an unnecessary line from Algorithm
Detection and localization of change-points in high-dimensional network traffic data
We propose a novel and efficient method, that we shall call TopRank in the
following paper, for detecting change-points in high-dimensional data. This
issue is of growing concern to the network security community since network
anomalies such as Denial of Service (DoS) attacks lead to changes in Internet
traffic. Our method consists of a data reduction stage based on record
filtering, followed by a nonparametric change-point detection test based on
-statistics. Using this approach, we can address massive data streams and
perform anomaly detection and localization on the fly. We show how it applies
to some real Internet traffic provided by France-T\'el\'ecom (a French Internet
service provider) in the framework of the ANR-RNRT OSCAR project. This approach
is very attractive since it benefits from a low computational load and is able
to detect and localize several types of network anomalies. We also assess the
performance of the TopRank algorithm using synthetic data and compare it with
alternative approaches based on random aggregation.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS232 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …