7,879 research outputs found

    SUBSPACE CLUSTERING PADA DATA MULTIDIMENSI MENGGUNAKAN ALGORITMA FINDIT SUBSPACE CLUSTERING MULTIDIMENSIONAL DATA USING FINDIT ALGORITHM

    Get PDF
    ABSTRAKSI: Dengan semakin luasnya penggunaan komputer di dalam bisnis, pemerintahan dan ilmu pengetahuan, penemuan pola-pola yang menarik dari basisdata berukuran besar menjadi sangat penting. Data mining muncul sebagai solusi bagi masalah analisis data yang dihadapi oleh banyak organisasi. Salah satu fungsionalitas dalam data mining adalah clustering yang bertujuan untuk mengelompokkan data ke dalam suatu cluster berdasarkan kemiripan karakteristiknya. Subspace clustering merupakan pengembangan dari metode clustering, yaitu membentuk kumpulan cluster pada dataset dengan menentukan dimensi yang paling relevan untuk setiap cluster. FINDIT melakukan pendekatan perhitungan dimension-oriented distance dan dimension voting untuk membentuk suatu cluster. Pada tugas akhir ini telah diimplementasikan algoritma FINDIT dan juga dianalisis performansi algoritma berdasarkan jumlah data, dimensi dataset terhadap waktu, serta akurasi cluster yang dihasilkan berdasarkan parameter Dmindist. Dmindist sebagai salah satu user parameter dapat mempengaruhi kinerja perangkat lunak. Jika semakin kecil maupun terlalu besar nilai Dmindist, akurasi cluster yang dihasilkan menjadi kurang baik, ditunjukkan dengan hilangnya satu atau lebih subspace pada original cluster. Peningkatan jumlah data mempengaruhi waktu untuk menemukan cluster, semakin banyak jumlah data maka semakin lama waktu yang dibutuhkan. Begitu pula untuk peningkatan jumlah dimensi data, akan menambah waktu untuk menemukan cluster.Kata Kunci : data mining, subspce clustering, algoritma FINDIT, dimension oriented distance, dimension voting, Dmindist.ABSTRACT: With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting patterns from large databases becomes essential. Data mining emerges as a solution to the data analysis probems faced by many organization. One of data mining functionality is clustering that is grouping data into clusters depends on their similarities. Subspace clustering is development in the clustering method, which finds clusters in a dataset by selecting the most relevant dimensions for each cluster separately. FINDIT finds clusters with subspace clustering based on two key ideas: dimension-oriented distance measure which fully utilizes dimensional difference information, and dimension voting policy. This final project has been implemented FINDIT algorithm and analysed the performance consider amount of data, dimension size of level to time and also consider Dmindist parameter of resultant clusters accuracy. User parameter Dmindist influence performance of software. Small or to over the value of Dmindist, resultant cluster accuracy become low, with missing one or more subspace in original cluster at the process. Increasing amount of data and dimension size will cause more time to get the result.Keyword: data mining, subspce clustering, FINDIT algorithm, dimension oriented distance, dimension voting, Dmindis

    Cluster Evaluation of Density Based Subspace Clustering

    Full text link
    Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU, FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLU method requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCY method is better for accuracy comparing with two other methods, although consequence time calculation was longer.Comment: 6 pages, 15 figure

    A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets

    Get PDF
    Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in subspaces. In this paper, we present a novel approach for finding outliers in the ‘interesting’ subspaces. The interesting subspaces are strongly correlated with `good\u27 clusters. This approach aims to group the meaningful subspaces and then identify outliers in the projected subspaces. In doing so, an extension to the subspacebased clustering algorithm is proposed so as to find the ‘good’ subspaces, and then outliers are identified in the projected subspaces using some classical outlier detection techniques such as distance-based and density-based algorithms. Comprehensive case studies are conducted using various types of subspace clustering and outlier detection algorithms. The experimental results demonstrate that the proposed method can detect outliers effectively and efficiently in high dimensional data sets

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
    corecore