10,268 research outputs found
Clustering high dimensional data using subspace and projected clustering algorithms
Problem statement: Clustering has a number of techniques that have been
developed in statistics, pattern recognition, data mining, and other fields.
Subspace clustering enumerates clusters of objects in all subspaces of a
dataset. It tends to produce many over lapping clusters. Approach: Subspace
clustering and projected clustering are research areas for clustering in high
dimensional spaces. In this research we experiment three clustering oriented
algorithms, PROCLUS, P3C and STATPC. Results: In general, PROCLUS performs
better in terms of time of calculation and produced the least number of
un-clustered data while STATPC outperforms PROCLUS and P3C in the accuracy of
both cluster points and relevant attributes found. Conclusions/Recommendations:
In this study, we analyze in detail the properties of different data clustering
method.Comment: 9 pages, 6 figure
A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in subspaces. In this paper, we present a novel approach for finding outliers in the ‘interesting’ subspaces. The interesting subspaces are strongly correlated with `good\u27 clusters. This approach aims to group the meaningful subspaces and then identify outliers in the projected subspaces. In doing so, an extension to the subspacebased clustering algorithm is proposed so as to find the ‘good’ subspaces, and then outliers are identified in the projected subspaces using some classical outlier detection techniques such as distance-based and density-based algorithms. Comprehensive case studies are conducted using various types of subspace clustering and outlier detection algorithms. The experimental results demonstrate that the proposed method can detect outliers effectively and efficiently in high dimensional data sets
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
- …