40,552 research outputs found
Local Subspace-Based Outlier Detection using Global Neighbourhoods
Outlier detection in high-dimensional data is a challenging yet important
task, as it has applications in, e.g., fraud detection and quality control.
State-of-the-art density-based algorithms perform well because they 1) take the
local neighbourhoods of data points into account and 2) consider feature
subspaces. In highly complex and high-dimensional data, however, existing
methods are likely to overlook important outliers because they do not
explicitly take into account that the data is often a mixture distribution of
multiple components.
We therefore introduce GLOSS, an algorithm that performs local subspace
outlier detection using global neighbourhoods. Experiments on synthetic data
demonstrate that GLOSS more accurately detects local outliers in mixed data
than its competitors. Moreover, experiments on real-world data show that our
approach identifies relevant outliers overlooked by existing methods,
confirming that one should keep an eye on the global perspective even when
doing local outlier detection.Comment: Short version accepted at IEEE BigData 201
A Local Density-Based Approach for Local Outlier Detection
This paper presents a simple but effective density-based outlier detection
approach with the local kernel density estimation (KDE). A Relative
Density-based Outlier Score (RDOS) is introduced to measure the local
outlierness of objects, in which the density distribution at the location of an
object is estimated with a local KDE method based on extended nearest neighbors
of the object. Instead of using only nearest neighbors, we further consider
reverse nearest neighbors and shared nearest neighbors of an object for density
distribution estimation. Some theoretical properties of the proposed RDOS
including its expected value and false alarm probability are derived. A
comprehensive experimental study on both synthetic and real-life data sets
demonstrates that our approach is more effective than state-of-the-art outlier
detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
In recent years, there have been many practical applications of anomaly
detection such as in predictive maintenance, detection of credit fraud, network
intrusion, and system failure. The goal of anomaly detection is to identify in
the test data anomalous behaviors that are either rare or unseen in the
training data. This is a common goal in predictive maintenance, which aims to
forecast the imminent faults of an appliance given abundant samples of normal
behaviors. Local outlier factor (LOF) is one of the state-of-the-art models
used for anomaly detection, but the predictive performance of LOF depends
greatly on the selection of hyperparameters. In this paper, we propose a novel,
heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model
that uses the proposed method shows good predictive performance in both
simulations and real data sets.Comment: 15 pages, 5 figure
- …