24,374 research outputs found

    Analysis of Agglomerative Clustering

    Get PDF
    The diameter k-clustering problem is the problem of partitioning a finite subset of R^d into k subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem for all values of k is the agglomerative clustering algorithm with the complete linkage strategy. For decades this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension dis a constant, we show that for any k the solution computed by this algorithm is an O(log k)-approximation to the diameter k-clustering problem. Moreover, our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm

    Analysis of Agglomerative Clustering

    Full text link
    The diameter kk-clustering problem is the problem of partitioning a finite subset of Rd\mathbb{R}^d into kk subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of kk) is the agglomerative clustering algorithm with the complete linkage strategy. For decades, this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension dd is a constant, we show that for any kk the solution computed by this algorithm is an O(log⁥k)O(\log k)-approximation to the diameter kk-clustering problem. Our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm. Furthermore, we analyze the closely related kk-center and discrete kk-center problem. For the corresponding agglomerative algorithms, we deduce an approximation factor of O(log⁥k)O(\log k) as well.Comment: A preliminary version of this article appeared in Proceedings of the 28th International Symposium on Theoretical Aspects of Computer Science (STACS '11), March 2011, pp. 308-319. This article also appeared in Algorithmica. The final publication is available at http://link.springer.com/article/10.1007/s00453-012-9717-

    Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

    Full text link
    The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. However there are different interpretations in the literature and there are different implementations of the Ward agglomerative algorithm in commonly used software systems, including differing expressions of the agglomerative criterion. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.Comment: 20 pages, 21 citations, 4 figure

    Pengelompokkan Data Wajah Menggunakan Metode Agglomerative Clustering dengan Analisis Komponen Utama

    Full text link
    PENGELOMPOKKAN DATA WAJAH MENGGUNAKAN METODE AGGLOMERATIVE CLUSTERING DENGAN ANALISIS KOMPONEN UTAMA Altien J. Rindengan1) dan Deiby Tineke Salaki1) 1)Program Studi Matematika FMIPA Universitas Sam Ratulangi Manado 95115 ABSTRAK Pada penelitian ini dilakukan analisis pengelompokkan data wajah dengan analisis komponen utama untuk mengambil beberapa akar ciri yang cukup mewakili data tersebut dan pengelompokkannya menggunakan metode agglomerative clustering. Dengan menggunakan program Matlab, data wajah yang terdiri dari 6 orang dengan 10 image dapat dikelompokkan sesuai data aslinya. Pengelompokkannya cukup menggunakan 3 akar ciri pada selang 68 %. Kata kunci: agglomerative clustering, analisis komponen utama, data wajah FACE DATA CLUSTERING USING AGGLOMERATIVE CLUSTERING METHODS WITH PRINCIPAL COMPONENT ANALYSIS ABSTRACT In this research, face data is grouped using principal component analysis by getting some of its eigenvalues which are representative enough to describe the data and then by using agglomerative clustering the data is clustered. By running the Matlab program, face data which is consist of 6 people with 10 images can be clustered to fit the original data. The clustering is enough using 3 eigenvalues with 68 % of interval

    Agglomerative Hierarchical Clustering: An Introduction to Essentials (1) Proximity Coefficients and Creation of a Vector-Distance Matrix and (2) Construction of the Hierarchical Tree and a Selection of Methods

    Get PDF
    The article is on a particular type of cluster analysis agglomerative hierarchical analysis and is a series of four main parts The first part deals with proximity coefficients and the creation of a vector-distance matrix The second part deals with the construction of the hierarchical tree and introduces a selection of clustering methods The third deals with a variety of ways to transform data prior to agglomerative cluster analysis The fourth deals with deals with measures and methods of cluster validity The fifth and final part deals with hypothesis generation The present article covers the first and second partsonly It explains how agglomerative cluster analysis works by implementing it in a data matrix step by step Different types of agglomerative hierarchical clustering methods are applied on purposely-made data matrix so different types of cluster structures are made from that same dataset The last three parts will be covered in the next publication s There are many articles tutorials and books on this subject The article has two main objectives 1 to keep the discussion short and easy to understand by hopefully any reader and 2 to develop the motivation for using agglomerative hierarchical clustering to analyse any highdimensional data of interest with respect to some research questio

    Agglomerative Hierarchical Clustering: An Introduction to Essentials (1) Proximity Coefficients and Creation of a Vector-Distance Matrix and (2) Construction of the Hierarchical Tree and a Selection of Methods

    Get PDF
    The article is on a particular type of cluster analysis agglomerative hierarchical analysis and is a series of four main parts The first part deals with proximity coefficients and the creation of a vector-distance matrix The second part deals with the construction of the hierarchical tree and introduces a selection of clustering methods The third deals with a variety of ways to transform data prior to agglomerative cluster analysis The fourth deals with deals with measures and methods of cluster validity The fifth and final part deals with hypothesis generation The present article covers the first and second partsonly It explains how agglomerative cluster analysis works by implementing it in a data matrix step by step Different types of agglomerative hierarchical clustering methods are applied on purposely-made data matrix so different types of cluster structures are made from that same dataset The last three parts will be covered in the next publication s There are many articles tutorials and books on this subject The article has two main objectives 1 to keep the discussion short and easy to understand by hopefully any reader and 2 to develop the motivation for using agglomerative hierarchical clustering to analyse any highdimensional data of interest with respect to some research questio

    Cluster Analysis as a Tool of Interpretation of Complex Systems

    Get PDF
    This paper deals with several problems in cluster analysis. It appears that the suggested solutions have not been considered in current literature. First, the author proposes the use of a permuted matrix as a tool for interpretation of clusters generated by hierarchical agglomerative clustering algorithms. Second, a new method of defining similarity between a pair of clusters is shown. This method leads to a new class of hierarchical agglomerative clustering. Third, two criteria are defined to optimize dendrograms that are outputs of hierarchical clustering. This paper has been presented at the Task Force Seminar Session on New Advances in Decision Support Systems, Laxenburg, Austria, November 3-5, 1986
    • 

    corecore