14,489 research outputs found
Detecting outlying subspaces for high-dimensional data: a heuristic search approach
[Abstract]: In this paper, we identify a new task for studying the out-lying degree of high-dimensional data, i.e. finding the sub-spaces (subset of features) in which given points are out-liers, and propose a novel detection algorithm, called High-D Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search
method with a sample-based learning process has been im-
plemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top-down, bottom-up and random search methods. Points in these sparse subspaces are assumed to be
the outliers. While knowing which data points are the
outliers can be useful, in many applications, it is more
important to identify the subspaces in which a given
point is an outlier, which motivates the proposal of a
new technique in this paper to handle this new task
CLUSTERED HIERARCHICAL ANOMALY AND OUTLIER DETECTION ALGORITHMS
Anomaly and outlier detection is a long-standing problem in machine learning. In some cases, anomaly detection is easy, such as when data are drawn from well-characterized distributions such as the Gaussian. However, when data occupy high-dimensional spaces, anomaly detection becomes more difficult. We present CLAM (Clustered Learning of Approximate Manifolds), a manifold mapping technique in any metric space. CLAM begins with a fast hierarchical clustering technique and then induces a graph from the cluster tree, based on overlapping clusters as selected using several geometric and topological features. Using these graphs, we implement CHAODA (Clustered Hierarchical Anomaly and Outlier Detection Algorithms), exploring various properties of the graphs and their constituent clusters to find outliers. CHAODA employs a form of transfer learning based on a training set of datasets, and applies this knowledge to a separate test set of datasets of different cardinalities, dimensionalities, and domains. On 24 publicly available datasets, we compare CHAODA (by measure of ROC AUC) to a variety of state-of-the-art unsupervised anomaly-detection algorithms. Six of the datasets are used for training. CHAODA outperforms other approaches on 16 of the remaining 18 datasets. CLAM and CHAODA scale to large, high-dimensional “big data” anomalydetection problems, and generalize across datasets and distance functions. Source code to CLAM and CHAODA are freely available on GitHub1
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
HOS-Miner: a system for detecting outlying subspaces of high-dimensional data
[Abstract]: We identify a new and interesting high-dimensional outlier detection problem in this paper that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subspaces) that utilizes a sample-based learning process to effectively identify the outlying subspaces of a given point
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
- …