10,531 research outputs found
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data
Detecting outliers which are grossly different from or inconsistent with the
remaining dataset is a major challenge in real-world KDD applications. Existing
outlier detection methods are ineffective on scattered real-world datasets due
to implicit data patterns and parameter setting issues. We define a novel
"Local Distance-based Outlier Factor" (LDOF) to measure the {outlier-ness} of
objects in scattered datasets which addresses these issues. LDOF uses the
relative location of an object to its neighbours to determine the degree to
which the object deviates from its neighbourhood. Properties of LDOF are
theoretically analysed including LDOF's lower bound and its false-detection
probability, as well as parameter settings. In order to facilitate parameter
settings in real-world applications, we employ a top-n technique in our outlier
detection approach, where only the objects with the highest LDOF values are
regarded as outliers. Compared to conventional approaches (such as top-n KNN
and top-n LOF), our method top-n LDOF is more effective at detecting outliers
in scattered data. It is also easier to set parameters, since its performance
is relatively stable over a large range of parameter values, as illustrated by
experimental results on both real-world and synthetic datasets.Comment: 15 LaTeX pages, 7 figures, 2 tables, 1 algorithm, 2 theorem
A Local Density-Based Approach for Local Outlier Detection
This paper presents a simple but effective density-based outlier detection
approach with the local kernel density estimation (KDE). A Relative
Density-based Outlier Score (RDOS) is introduced to measure the local
outlierness of objects, in which the density distribution at the location of an
object is estimated with a local KDE method based on extended nearest neighbors
of the object. Instead of using only nearest neighbors, we further consider
reverse nearest neighbors and shared nearest neighbors of an object for density
distribution estimation. Some theoretical properties of the proposed RDOS
including its expected value and false alarm probability are derived. A
comprehensive experimental study on both synthetic and real-life data sets
demonstrates that our approach is more effective than state-of-the-art outlier
detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter
- …