Search CORE

397 research outputs found

Automatic Subspace Clustering of High Dimensional Data

Author: D. Franzblau
Dimitrios Gunopulos
Johannes Gehrke
L. Lovász
M. Berger
M. Zait
P. Arabie
P. Schroeter
Prabhakar Raghavan
R. Chhikara
Rakesh Agrawal
S. Wharton
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A novel algorithm for fast and scalable subspace clustering of high-dimensional data

Author: A Geiger
C Huttenhower
CC Aggarwal
CD Manning
CH Cheng
D Jiang
E Elhamifar
E Müller
F Pedregosa
H-PH Kriegel
HP Kriegel
IT Joliffe
J Fan
J Jun
Jianqing Zhu
K Beyer
K Eren
K Sim
KG Woo
L Parsons
M Ester
M Hall
M Steinbach
MB Eisen
MM Babu
P Erdös
PE Dewdney
R Agrawal
R Basri
R Vidal
R Vidal
R Xu
RE Bellman
S Günnemann
S Jahirabadkar
S Yoon
T Barrett
T Li
T Zhang
W Jang
Y Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Cluster Evaluation of Density Based Subspace Clustering

Author: Sembiring Rahmat Widia
Zain Jasni Mohamad
Publication venue
Publication date: 01/01/2010
Field of study

Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU, FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLU method requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCY method is better for accuracy comparing with two other methods, although consequence time calculation was longer.Comment: 6 pages, 15 figure

arXiv.org e-Print Archive

UMP Institutional Repository

A Survey on Soft Subspace Clustering

Author: Choi Kup-Sze
Deng Zhaohong
Jiang Yizhang
Wang Jun
Wang Shitong
Publication venue: 'Elsevier BV'
Publication date: 07/04/2016
Field of study

Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

PolyU Institutional Repository

Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval

Author: R. Pushpalatha
Sundaram K. Meenakshi
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2018
Field of study

Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-art-works

IAES journal

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science