64 research outputs found

    Partitional Clustering

    Get PDF
    People are living in a world full of data. Humans are collecting data from many measurements and observations in their daily works. The sorting of these numerous data is important and necessary in terms of analyzing, reasoning, and decision-making processes. For this reason, clustering has been used in many areas and has become very important in recent years. Feature selection and classifying the data in subsets can be changed data to data. As a result of these feature selection methods, some clustering methods have been revealed. Hierarchical clustering, partitional clustering, artificial system clustering, kernel-based clustering, and sequential data clustering are determined for different clustering strategies. This chapter examines some popular partitional clustering techniques and algorithms. Partitional clustering assigns a set of data points into k-clusters by using iterative processes. The predefined criterion function (J) assigns the datum into kth number set. As a result of this criterion function value in k sets (maximization and minimization calculation), clustering can be done. This chapter starts with criterion function for clustering process. In addition, some applications will be done for each algorithm in this chapter

    A new distance measure for model-based sequence clustering

    Get PDF
    We review the existing alternatives for defining model-based distances for clustering sequences and propose a new one based on the Kullback-Leibler divergence. This distance is shown to be especially useful in combination with spectral clustering. For improved performance in real-world scenarios, a model selection scheme is also proposed.Publicad

    Detection and Evaluation of Clusters within Sequential Data

    Full text link
    Motivated by theoretical advancements in dimensionality reduction techniques we use a recent model, called Block Markov Chains, to conduct a practical study of clustering in real-world sequential data. Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees and can be deployed in sparse data regimes. Despite these favorable theoretical properties, a thorough evaluation of these algorithms in realistic settings has been lacking. We address this issue and investigate the suitability of these clustering algorithms in exploratory data analysis of real-world sequential data. In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets. In order to evaluate the determined clusters, and the associated Block Markov Chain model, we further develop a set of evaluation tools. These tools include benchmarking, spectral noise analysis and statistical model selection tools. An efficient implementation of the clustering algorithm and the new evaluation tools is made available together with this paper. Practical challenges associated to real-world data are encountered and discussed. It is ultimately found that the Block Markov Chain model assumption, together with the tools developed here, can indeed produce meaningful insights in exploratory data analyses despite the complexity and sparsity of real-world data.Comment: 37 pages, 12 figure

    Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

    Get PDF
    Similarity-based approaches represent a promising direction for time series analysis. However, many such methods rely on parameter tuning, and some have shortcomings if the time series are multivariate (MTS), due to dependencies between attributes, or the time series contain missing data. In this paper, we address these challenges within the powerful context of kernel methods by proposing the robust \emph{time series cluster kernel} (TCK). The approach taken leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with informative prior distributions. An ensemble learning approach is exploited to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel. We evaluate the TCK on synthetic and real data and compare to other state-of-the-art techniques. The experimental results demonstrate that the TCK is robust to parameter choices, provides competitive results for MTS without missing data and outstanding results for missing data.Comment: 23 pages, 6 figure

    Self-Organizing Map Menggunakan Davies-Bouldin Index dalam Pengelompokan Wilayah Indonesia Berdasarkan Konsumsi Pangan

    Get PDF
    ABSTRAKKecukupan konsumsi pangan merupakan salah satu penunjang terbentuknya sumber daya manusia unggul yang menjadi fokus kebijakan pembangunan di Indonesia. Agar konsumsi pangan terpenuhi, salah satu cara yang dapat dilakukan adalah melakukan pengelompokan wilayah berdasarkan konsumsi pangan. Penelitian ini bertujuan untuk mengelompokkan wilayah Indonesia berdasarkan konsumsi pangan berdasarkan data konsumsi kalori per kapita sehari dari berbagai komoditas pangan. Pengelompokan wilayah dilakukan dengan metode self-organizing map (SOM) dengan terlebih dahulu ditentukan jumlah cluster optimum menggunakan nilai Davies-Bouldin Index (DBI) terkecil. Hasil penelitian menunjukkan bahwa hasil cluster optimum yang terbentuk sejumlah 4 cluster dengan jumlah anggota untuk cluster 1 sebanyak 22 provinsi, cluster 2 sebanyak 10 provinsi, cluster 3 sebanyak 1 provinsi, dan cluster 4 sebanyak 1 provinsi.ABSTRACTAdequate food consumption is one of the supports for forming superior human resources, which is the focus of development policies in Indonesia. To fulfill food consumption, one way to be done is to group regions based on food consumption. This study aims to classify regions of Indonesia based on food consumption based on average daily per capita calorie consumption data from various food commodities. Regional grouping is done using the self-organizing map (SOM) method by first determining the optimum number of clusters using the smallest Davies-Bouldin Index (DBI) value. The results showed that the optimum cluster results were 4 clusters with the number of members for cluster 1 as many as 22 provinces, cluster 2 as many as 10 provinces, cluster 3 as many as 1 province, and cluster 4 as many as 1 province

    COMPARISON OF FUZZY C-MEANS AND FUZZY GUSTAFSON-KESSEL CLUSTERING METHODS IN PROVINCIAL GROUPING IN INDONESIA BASED ON CRIMINALITY-RELATED FACTORS

    Get PDF
    Indonesia is a country that has a population density that is increasing every year, with the increase in population density, the crime rate in Indonesia is increasing. Criminal acts arise because they are supported by factors that cause crime. To improve the security and welfare of the Indonesian people, the authors grouped each province in Indonesia based on the factors that influence crime. This study uses a comparison of the Fuzzy C-Means Clustering (FCM) and Fuzzy Gustafson-Kessel Clustering (FGK) methods by using the validation index for determining the optimal cluster, namely the Davies Bouldin Index The data used  is secondary data in the form of variables forming factors that affect the crime rate in Indonesia, where the data obtained comes from the website of the Central Statistics Agency (BPS). The results obtained in this study for the FGK method are better than the FCM method because they have a smaller standard deviation ratio. The results of grouping using the best method, namely FGK, it was found that the optimal number of clusters formed was 5 clusters with the results of grouping cluster 1 consisting of 6 provinces, cluster 2 consisting of 4 provinces, cluster 3 consisting of 11 provinces, cluster 4 consisting of 5 provinces, and cluster 5 consisting of 8 provinces
    corecore