6,543 research outputs found
How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?
When sufficient labeled data are available, classical criteria based on
Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be
used to compare the performance of un-supervised anomaly detection algorithms.
However , in many situations, few or no data are labeled. This calls for
alternative criteria one can compute on non-labeled data. In this paper, two
criteria that do not require labels are empirically shown to discriminate
accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria
are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which
generally cannot be well estimated in large dimension. A methodology based on
feature sub-sampling and aggregating is also described and tested, extending
the use of these criteria to high-dimensional datasets and solving major
drawbacks inherent to standard EM and MV curves
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
- …