Search CORE

14,489 research outputs found

Detecting outlying subspaces for high-dimensional data: a heuristic search approach

Author: Zhang Ji
Publication venue
Publication date: 01/01/2005
Field of study

[Abstract]: In this paper, we identify a new task for studying the out-lying degree of high-dimensional data, i.e. finding the sub-spaces (subset of features) in which given points are out-liers, and propose a novel detection algorithm, called High-D Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been im- plemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top-down, bottom-up and random search methods. Points in these sparse subspaces are assumed to be the outliers. While knowing which data points are the outliers can be useful, in many applications, it is more important to identify the subspaces in which a given point is an outlier, which motivates the proposal of a new technique in this paper to handle this new task

University of Southern Queensland ePrints

CLUSTERED HIERARCHICAL ANOMALY AND OUTLIER DETECTION ALGORITHMS

Author: Daniels Noah M.
Howard Thomas J., III
Ishaq Najib
Publication venue: DigitalCommons@URI
Publication date: 24/11/2021
Field of study

Anomaly and outlier detection is a long-standing problem in machine learning. In some cases, anomaly detection is easy, such as when data are drawn from well-characterized distributions such as the Gaussian. However, when data occupy high-dimensional spaces, anomaly detection becomes more difficult. We present CLAM (Clustered Learning of Approximate Manifolds), a manifold mapping technique in any metric space. CLAM begins with a fast hierarchical clustering technique and then induces a graph from the cluster tree, based on overlapping clusters as selected using several geometric and topological features. Using these graphs, we implement CHAODA (Clustered Hierarchical Anomaly and Outlier Detection Algorithms), exploring various properties of the graphs and their constituent clusters to find outliers. CHAODA employs a form of transfer learning based on a training set of datasets, and applies this knowledge to a separate test set of datasets of different cardinalities, dimensionalities, and domains. On 24 publicly available datasets, we compare CHAODA (by measure of ROC AUC) to a variety of state-of-the-art unsupervised anomaly-detection algorithms. Six of the datasets are used for training. CHAODA outperforms other approaches on 16 of the remaining 18 datasets. CLAM and CHAODA scale to large, high-dimensional “big data” anomalydetection problems, and generalize across datasets and distance functions. Source code to CLAM and CHAODA are freely available on GitHub1

DigitalCommons@URI

Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance

Author: Wang Hai
Zhang Ji
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/10/2006
Field of study

[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively

University of Southern Queensland ePrints

HOS-Miner: a system for detecting outlying subspaces of high-dimensional data

Author: Ling Tok Wang
Lou Meng
Wang Hai
Zhang Ji
Publication venue: Morgan Kaufmann Publishers Inc.
Publication date: 01/01/2004
Field of study

[Abstract]: We identify a new and interesting high-dimensional outlier detection problem in this paper that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subspaces) that utilizes a sample-based learning process to effectively identify the outlying subspaces of a given point

University of Southern Queensland ePrints

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information