60,799 research outputs found
Recognisation of Outlier using Distance based method for Large Scale Database
This paper studies the difficulties of outlier detection on inexact data. We study the normal instances for each uncertain object using the instances of objects with analogous properties. Outlier detection is a significant research problem in data mining that goals to determine valuable abnormal and irregular patterns hidden in vast data sets. Most existing outlier detection approaches only deal with static data with comparatively low dimensionality. Newly, outlier detection for high-dimensional stream data turn into a new emergent research problem. A key remark that inspires this research is that outliers in high-dimensional data are predictable outliers, i.e., they are embedded in lower dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very stimulating task for numerous reasons. The paper shows the detailed study of outlier detection algorithms and its results also
HOS-Miner: a system for detecting outlying subspaces of high-dimensional data
[Abstract]: We identify a new and interesting high-dimensional outlier detection problem in this paper that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subspaces) that utilizes a sample-based learning process to effectively identify the outlying subspaces of a given point
Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy
[Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality.
Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers
in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream
data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams.
In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional
data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of
top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the
outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers
for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high-
dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc
Outlier detection and ranking based on subspace clustering
Detecting outliers is an important task for many applications
including fraud detection or consistency validation in real world
data. Particularly in the presence of uncertain data or imprecise data,
similar objects regularly deviate in their attribute values. The notion
of outliers has thus to be defined carefully. When considering outlier
detection as a task which is complementary to clustering, binary decisions
whether an object is regarded to be an outlier or not seem to be
near at hand. For high-dimensional data, however, objects may belong
to different clusters in different subspaces. More fine-grained concepts to
define outliers are therefore demanded. By our new OutRank approach,
we address outlier detection in heterogeneous high dimensional data and
propose a novel scoring function that provides a consistent model for
ranking outliers in the presence of different attribute types. Preliminary
experiments demonstrate the potential for successful detection and reasonable ranking of outliers in high dimensional data sets
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive topādown, bottomāup and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
- ā¦