Search CORE

60,799 research outputs found

Recognisation of Outlier using Distance based method for Large Scale Database

Author: Madhav Bokare, V.M Thakare
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2017
Field of study

This paper studies the difficulties of outlier detection on inexact data. We study the normal instances for each uncertain object using the instances of objects with analogous properties. Outlier detection is a significant research problem in data mining that goals to determine valuable abnormal and irregular patterns hidden in vast data sets. Most existing outlier detection approaches only deal with static data with comparatively low dimensionality. Newly, outlier detection for high-dimensional stream data turn into a new emergent research problem. A key remark that inspires this research is that outliers in high-dimensional data are predictable outliers, i.e., they are embedded in lower dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very stimulating task for numerous reasons. The paper shows the detailed study of outlier detection algorithms and its results also

International Journal on Recent and Innovation Trends in Computing and Communication

HOS-Miner: a system for detecting outlying subspaces of high-dimensional data

Author: Ling Tok Wang
Lou Meng
Wang Hai
Zhang Ji
Publication venue: Morgan Kaufmann Publishers Inc.
Publication date: 01/01/2004
Field of study

[Abstract]: We identify a new and interesting high-dimensional outlier detection problem in this paper that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subspaces) that utilizes a sample-based learning process to effectively identify the outlying subspaces of a given point

University of Southern Queensland ePrints

Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy

Author: Zhang Ji
Publication venue
Publication date: 01/12/2008
Field of study

[Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality. Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams. In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high- dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc

University of Southern Queensland ePrints

Outlier detection and ranking based on subspace clustering

Author: Assent Ira
Seidl Thomas
Steinhausen Uwe
Publication venue: Dagstuhl Seminar Proceedings. 08421 - Uncertainty Management in Information Systems
Publication date: 01/01/2009
Field of study

Detecting outliers is an important task for many applications including fraud detection or consistency validation in real world data. Particularly in the presence of uncertain data or imprecise data, similar objects regularly deviate in their attribute values. The notion of outliers has thus to be defined carefully. When considering outlier detection as a task which is complementary to clustering, binary decisions whether an object is regarded to be an outlier or not seem to be near at hand. For high-dimensional data, however, objects may belong to different clusters in different subspaces. More fine-grained concepts to define outliers are therefore demanded. By our new OutRank approach, we address outlier detection in heterogeneous high dimensional data and propose a novel scoring function that provides a consistent model for ranking outliers in the presence of different attribute types. Preliminary experiments demonstrate the potential for successful detection and reasonable ranking of outliers in high dimensional data sets

Dagstuhl Research Online Publication Server

Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance

Author: Wang Hai
Zhang Ji
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/10/2006
Field of study

[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively

University of Southern Queensland ePrints