Search CORE

24,374 research outputs found

Analysis of Agglomerative Clustering

Author: Ackermann Marcel R.
Bloemer Johannes
Kuntze Daniel
Sohler Christian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Theoretical Aspects of Computer Science (STACS 2011)
Publication date: 01/01/2011
Field of study

The diameter k-clustering problem is the problem of partitioning a finite subset of R^d into k subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem for all values of k is the agglomerative clustering algorithm with the complete linkage strategy. For decades this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension dis a constant, we show that for any k the solution computed by this algorithm is an O(log k)-approximation to the diameter k-clustering problem. Moreover, our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm

CiteSeerX

Dagstuhl Research Online Publication Server

Analysis of Agglomerative Clustering

Author: A.Z. Broder
Christian Sohler
Daniel Kuntze
F. Pereira
Johannes Blömer
K. Florek
K. Lee
L.L. McQuitty
M. Bādoiu
M. Charikar
M. Fréchet
M. Naszódi
M.B. Eisen
Marcel R. Ackermann
P.H.A. Sneath
R. Webster
S. Dasgupta
T. Feder
T.F. Gonzalez
W.B. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/03/2014
Field of study

The diameter

k

-clustering problem is the problem of partitioning a finite subset of

\mathbb{R}^d

into

k

subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of

k

) is the agglomerative clustering algorithm with the complete linkage strategy. For decades, this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming that the dimension

d

is a constant, we show that for any

k

the solution computed by this algorithm is an

O(\log k)

-approximation to the diameter

k

-clustering problem. Our analysis does not only hold for the Euclidean distance but for any metric that is based on a norm. Furthermore, we analyze the closely related

k

-center and discrete

k

-center problem. For the corresponding agglomerative algorithms, we deduce an approximation factor of

O(\log k)

as well.Comment: A preliminary version of this article appeared in Proceedings of the 28th International Symposium on Theoretical Aspects of Computer Science (STACS '11), March 2011, pp. 308-319. This article also appeared in Algorithmica. The final publication is available at http://link.springer.com/article/10.1007/s00453-012-9717-

arXiv.org e-Print Archive

Crossref

Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

Author: AK JAIN
B ROUX LE
D WISHART
F MURTAGH
F MURTAGH
F MURTAGH
F MURTAGH
Fionn Murtagh
GJ SZÉKELY
GN LANCE
JC GOWER
JH WARD
JP BENZÉCRI
L KAUFMAN
L ORLÓCI
M BRUYNOOGHE
M JAMBU
M JAMBU
MR ANDERBERG
P LEGENDRE
P LEGENDRE
Pierre Legendre
RA FISHER
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2011
Field of study

The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. However there are different interpretations in the literature and there are different implementations of the Ward agglomerative algorithm in commonly used software systems, including differing expressions of the agglomerative criterion. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.Comment: 20 pages, 21 citations, 4 figure

arXiv.org e-Print Archive

Goldsmiths Research Online

Crossref

De Montfort University Open Research Archive

Pengelompokkan Data Wajah Menggunakan Metode Agglomerative Clustering dengan Analisis Komponen Utama

Author: Rindengan A. J. (Altien)
Salaki D. T. (Deiby)
Publication venue: Sam Ratulangi University
Publication date: 31/10/2011
Field of study

PENGELOMPOKKAN DATA WAJAH MENGGUNAKAN METODE AGGLOMERATIVE CLUSTERING DENGAN ANALISIS KOMPONEN UTAMA Altien J. Rindengan1) dan Deiby Tineke Salaki1) 1)Program Studi Matematika FMIPA Universitas Sam Ratulangi Manado 95115 ABSTRAK Pada penelitian ini dilakukan analisis pengelompokkan data wajah dengan analisis komponen utama untuk mengambil beberapa akar ciri yang cukup mewakili data tersebut dan pengelompokkannya menggunakan metode agglomerative clustering. Dengan menggunakan program Matlab, data wajah yang terdiri dari 6 orang dengan 10 image dapat dikelompokkan sesuai data aslinya. Pengelompokkannya cukup menggunakan 3 akar ciri pada selang 68 %. Kata kunci: agglomerative clustering, analisis komponen utama, data wajah FACE DATA CLUSTERING USING AGGLOMERATIVE CLUSTERING METHODS WITH PRINCIPAL COMPONENT ANALYSIS ABSTRACT In this research, face data is grouped using principal component analysis by getting some of its eigenvalues which are representative enough to describe the data and then by using agglomerative clustering the data is clustered. By running the Matlab program, face data which is consist of 6 people with 10 images can be clustered to fit the original data. The clustering is enough using 3 eigenvalues with 68 % of interval

Neliti

Agglomerative Hierarchical Clustering: An Introduction to Essentials (1) Proximity Coefficients and Creation of a Vector-Distance Matrix and (2) Construction of the Hierarchical Tree and a Selection of Methods

Author: Refat Aljumily
Publication venue: Global Journals Inc. (US)
Publication date: 15/03/2016
Field of study

The article is on a particular type of cluster analysis agglomerative hierarchical analysis and is a series of four main parts The first part deals with proximity coefficients and the creation of a vector-distance matrix The second part deals with the construction of the hierarchical tree and introduces a selection of clustering methods The third deals with a variety of ways to transform data prior to agglomerative cluster analysis The fourth deals with deals with measures and methods of cluster validity The fifth and final part deals with hypothesis generation The present article covers the first and second partsonly It explains how agglomerative cluster analysis works by implementing it in a data matrix step by step Different types of agglomerative hierarchical clustering methods are applied on purposely-made data matrix so different types of cluster structures are made from that same dataset The last three parts will be covered in the next publication s There are many articles tutorials and books on this subject The article has two main objectives 1 to keep the discussion short and easy to understand by hopefully any reader and 2 to develop the motivation for using agglomerative hierarchical clustering to analyse any highdimensional data of interest with respect to some research questio

Global Journal of Human-Social Science

Agglomerative Hierarchical Clustering: An Introduction to Essentials (1) Proximity Coefficients and Creation of a Vector-Distance Matrix and (2) Construction of the Hierarchical Tree and a Selection of Methods

Author: Refat Aljumily
Publication venue: Global Journals Inc. (US)
Publication date: 15/03/2016
Field of study

Global Journal of Human-Social Science

Cluster Analysis as a Tool of Interpretation of Complex Systems

Author: Miyamoto S.
Publication venue: WP-87-041
Publication date: 01/01/1987
Field of study

This paper deals with several problems in cluster analysis. It appears that the suggested solutions have not been considered in current literature. First, the author proposes the use of a permuted matrix as a tool for interpretation of clusters generated by hierarchical agglomerative clustering algorithms. Second, a new method of defining similarity between a pair of clusters is shown. This method leads to a new class of hierarchical agglomerative clustering. Third, two criteria are defined to optimize dendrograms that are outputs of hierarchical clustering. This paper has been presented at the Task Force Seminar Session on New Advances in Decision Support Systems, Laxenburg, Austria, November 3-5, 1986

International Institute for Applied Systems Analysis (IIASA)