Search CORE

7,879 research outputs found

SUBSPACE CLUSTERING PADA DATA MULTIDIMENSI MENGGUNAKAN ALGORITMA FINDIT SUBSPACE CLUSTERING MULTIDIMENSIONAL DATA USING FINDIT ALGORITHM

Author: Hutama A B
Publication venue: Universitas Telkom
Publication date: 01/01/2006
Field of study

ABSTRAKSI: Dengan semakin luasnya penggunaan komputer di dalam bisnis, pemerintahan dan ilmu pengetahuan, penemuan pola-pola yang menarik dari basisdata berukuran besar menjadi sangat penting. Data mining muncul sebagai solusi bagi masalah analisis data yang dihadapi oleh banyak organisasi. Salah satu fungsionalitas dalam data mining adalah clustering yang bertujuan untuk mengelompokkan data ke dalam suatu cluster berdasarkan kemiripan karakteristiknya. Subspace clustering merupakan pengembangan dari metode clustering, yaitu membentuk kumpulan cluster pada dataset dengan menentukan dimensi yang paling relevan untuk setiap cluster. FINDIT melakukan pendekatan perhitungan dimension-oriented distance dan dimension voting untuk membentuk suatu cluster. Pada tugas akhir ini telah diimplementasikan algoritma FINDIT dan juga dianalisis performansi algoritma berdasarkan jumlah data, dimensi dataset terhadap waktu, serta akurasi cluster yang dihasilkan berdasarkan parameter Dmindist. Dmindist sebagai salah satu user parameter dapat mempengaruhi kinerja perangkat lunak. Jika semakin kecil maupun terlalu besar nilai Dmindist, akurasi cluster yang dihasilkan menjadi kurang baik, ditunjukkan dengan hilangnya satu atau lebih subspace pada original cluster. Peningkatan jumlah data mempengaruhi waktu untuk menemukan cluster, semakin banyak jumlah data maka semakin lama waktu yang dibutuhkan. Begitu pula untuk peningkatan jumlah dimensi data, akan menambah waktu untuk menemukan cluster.Kata Kunci : data mining, subspce clustering, algoritma FINDIT, dimension oriented distance, dimension voting, Dmindist.ABSTRACT: With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting patterns from large databases becomes essential. Data mining emerges as a solution to the data analysis probems faced by many organization. One of data mining functionality is clustering that is grouping data into clusters depends on their similarities. Subspace clustering is development in the clustering method, which finds clusters in a dataset by selecting the most relevant dimensions for each cluster separately. FINDIT finds clusters with subspace clustering based on two key ideas: dimension-oriented distance measure which fully utilizes dimensional difference information, and dimension voting policy. This final project has been implemented FINDIT algorithm and analysed the performance consider amount of data, dimension size of level to time and also consider Dmindist parameter of resultant clusters accuracy. User parameter Dmindist influence performance of software. Small or to over the value of Dmindist, resultant cluster accuracy become low, with missing one or more subspace in original cluster at the process. Increasing amount of data and dimension size will cause more time to get the result.Keyword: data mining, subspce clustering, FINDIT algorithm, dimension oriented distance, dimension voting, Dmindis

Open Library

Cluster Evaluation of Density Based Subspace Clustering

Author: Sembiring Rahmat Widia
Zain Jasni Mohamad
Publication venue
Publication date: 01/01/2010
Field of study

Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU, FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLU method requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCY method is better for accuracy comparing with two other methods, although consequence time calculation was longer.Comment: 6 pages, 15 figure

arXiv.org e-Print Archive

UMP Institutional Repository

A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets

Author: Leng Jinsong
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2010
Field of study

Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in subspaces. In this paper, we present a novel approach for finding outliers in the ‘interesting’ subspaces. The interesting subspaces are strongly correlated with `good\u27 clusters. This approach aims to group the meaningful subspaces and then identify outliers in the projected subspaces. In doing so, an extension to the subspacebased clustering algorithm is proposed so as to find the ‘good’ subspaces, and then outliers are identified in the projected subspaces using some classical outlier detection techniques such as distance-based and density-based algorithms. Comprehensive case studies are conducted using various types of subspace clustering and outlier detection algorithms. The experimental results demonstrate that the proposed method can detect outliers effectively and efficiently in high dimensional data sets

CiteSeerX

Research Online @ ECU

A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

Author: Havinga P.J.M.
Meratnia N.
Zhang Yang
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

University of Twente Research Information

A Survey on Soft Subspace Clustering

Author: Choi Kup-Sze
Deng Zhaohong
Jiang Yizhang
Wang Jun
Wang Shitong
Publication venue: 'Elsevier BV'
Publication date: 07/04/2016
Field of study

Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

PolyU Institutional Repository