35,272 research outputs found
An Evaluation of Classification and Outlier Detection Algorithms
This paper evaluates algorithms for classification and outlier detection
accuracies in temporal data. We focus on algorithms that train and classify
rapidly and can be used for systems that need to incorporate new data
regularly. Hence, we compare the accuracy of six fast algorithms using a range
of well-known time-series datasets. The analyses demonstrate that the choice of
algorithm is task and data specific but that we can derive heuristics for
choosing. Gradient Boosting Machines are generally best for classification but
there is no single winner for outlier detection though Gradient Boosting
Machines (again) and Random Forest are better. Hence, we recommend running
evaluations of a number of algorithms using our heuristics
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Outlier Detection Techniques For Wireless Sensor Networks: A Survey
In the field of wireless sensor networks, measurements that
significantly deviate from the normal pattern of sensed data are
considered as outliers. The potential sources of outliers include
noise and errors, events, and malicious attacks on the network.
Traditional outlier detection techniques are not directly
applicable to wireless sensor networks due to the multivariate
nature of sensor data and specific requirements and limitations of
the wireless sensor networks. This survey provides a comprehensive
overview of existing outlier detection techniques specifically
developed for the wireless sensor networks. Additionally, it
presents a technique-based taxonomy and a decision tree to be used
as a guideline to select a technique suitable for the application
at hand based on characteristics such as data type, outlier type,
outlier degree
Local Subspace-Based Outlier Detection using Global Neighbourhoods
Outlier detection in high-dimensional data is a challenging yet important
task, as it has applications in, e.g., fraud detection and quality control.
State-of-the-art density-based algorithms perform well because they 1) take the
local neighbourhoods of data points into account and 2) consider feature
subspaces. In highly complex and high-dimensional data, however, existing
methods are likely to overlook important outliers because they do not
explicitly take into account that the data is often a mixture distribution of
multiple components.
We therefore introduce GLOSS, an algorithm that performs local subspace
outlier detection using global neighbourhoods. Experiments on synthetic data
demonstrate that GLOSS more accurately detects local outliers in mixed data
than its competitors. Moreover, experiments on real-world data show that our
approach identifies relevant outliers overlooked by existing methods,
confirming that one should keep an eye on the global perspective even when
doing local outlier detection.Comment: Short version accepted at IEEE BigData 201
- …