7,879 research outputs found
SUBSPACE CLUSTERING PADA DATA MULTIDIMENSI MENGGUNAKAN ALGORITMA FINDIT SUBSPACE CLUSTERING MULTIDIMENSIONAL DATA USING FINDIT ALGORITHM
ABSTRAKSI: Dengan semakin luasnya penggunaan komputer di dalam bisnis, pemerintahan dan ilmu pengetahuan, penemuan pola-pola yang menarik dari basisdata berukuran besar menjadi sangat penting. Data mining muncul sebagai solusi bagi masalah analisis data yang dihadapi oleh banyak organisasi. Salah satu fungsionalitas dalam data mining adalah clustering yang bertujuan untuk mengelompokkan data ke dalam suatu cluster berdasarkan kemiripan karakteristiknya. Subspace clustering merupakan pengembangan dari metode clustering, yaitu membentuk kumpulan cluster pada dataset dengan menentukan dimensi yang paling relevan untuk setiap cluster. FINDIT melakukan pendekatan perhitungan dimension-oriented distance dan dimension voting untuk membentuk suatu cluster. Pada tugas akhir ini telah diimplementasikan algoritma FINDIT dan juga dianalisis performansi algoritma berdasarkan jumlah data, dimensi dataset terhadap waktu, serta akurasi cluster yang dihasilkan berdasarkan parameter Dmindist. Dmindist sebagai salah satu user parameter dapat mempengaruhi kinerja perangkat lunak. Jika semakin kecil maupun terlalu besar nilai Dmindist, akurasi cluster yang dihasilkan menjadi kurang baik, ditunjukkan dengan hilangnya satu atau lebih subspace pada original cluster. Peningkatan jumlah data mempengaruhi waktu untuk menemukan cluster, semakin banyak jumlah data maka semakin lama waktu yang dibutuhkan. Begitu pula untuk peningkatan jumlah dimensi data, akan menambah waktu untuk menemukan cluster.Kata Kunci : data mining, subspce clustering, algoritma FINDIT, dimension oriented distance, dimension voting, Dmindist.ABSTRACT: With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting patterns from large databases becomes essential. Data mining emerges as a solution to the data analysis probems faced by many organization. One of data mining functionality is clustering that is grouping data into clusters depends on their similarities. Subspace clustering is development in the clustering method, which finds clusters in a dataset by selecting the most relevant dimensions for each cluster separately. FINDIT finds clusters with subspace clustering based on two key ideas: dimension-oriented distance measure which fully utilizes dimensional difference information, and dimension voting policy. This final project has been implemented FINDIT algorithm and analysed the performance consider amount of data, dimension size of level to time and also consider Dmindist parameter of resultant clusters accuracy. User parameter Dmindist influence performance of software. Small or to over the value of Dmindist, resultant cluster accuracy become low, with missing one or more subspace in original cluster at the process. Increasing amount of data and dimension size will cause more time to get the result.Keyword: data mining, subspce clustering, FINDIT algorithm, dimension oriented distance, dimension voting, Dmindis
Cluster Evaluation of Density Based Subspace Clustering
Clustering real world data often faced with curse of dimensionality, where
real world data often consist of many dimensions. Multidimensional data
clustering evaluation can be done through a density-based approach. Density
approaches based on the paradigm introduced by DBSCAN clustering. In this
approach, density of each object neighbours with MinPoints will be calculated.
Cluster change will occur in accordance with changes in density of each object
neighbours. The neighbours of each object typically determined using a distance
function, for example the Euclidean distance. In this paper SUBCLU, FIRES and
INSCY methods will be applied to clustering 6x1595 dimension synthetic
datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used
as evaluation performance parameters. Evaluation results showed SUBCLU method
requires considerable time to process subspace clustering; however, its value
coverage is better. Meanwhile INSCY method is better for accuracy comparing
with two other methods, although consequence time calculation was longer.Comment: 6 pages, 15 figure
A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in subspaces. In this paper, we present a novel approach for finding outliers in the ‘interesting’ subspaces. The interesting subspaces are strongly correlated with `good\u27 clusters. This approach aims to group the meaningful subspaces and then identify outliers in the projected subspaces. In doing so, an extension to the subspacebased clustering algorithm is proposed so as to find the ‘good’ subspaces, and then outliers are identified in the projected subspaces using some classical outlier detection techniques such as distance-based and density-based algorithms. Comprehensive case studies are conducted using various types of subspace clustering and outlier detection algorithms. The experimental results demonstrate that the proposed method can detect outliers effectively and efficiently in high dimensional data sets
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
- …