Search CORE

12,980 research outputs found

Anytime Hierarchical Clustering

Author: Arslan Omur
Koditschek Daniel E.
Publication venue
Publication date: 13/04/2014
Field of study

We propose a new anytime hierarchical clustering method that iteratively transforms an arbitrary initial hierarchy on the configuration of measurements along a sequence of trees we prove for a fixed data set must terminate in a chain of nested partitions that satisfies a natural homogeneity requirement. Each recursive step re-edits the tree so as to improve a local measure of cluster homogeneity that is compatible with a number of commonly used (e.g., single, average, complete) linkage functions. As an alternative to the standard batch algorithms, we present numerical evidence to suggest that appropriate adaptations of this method can yield decentralized, scalable algorithms suitable for distributed/parallel computation of clustering hierarchies and online tracking of clustering trees applicable to large, dynamically changing databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a conferenc

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Fast redshift clustering with the Baire (ultra) metric

Author: Contreras Pedro
Murtagh Fionn
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/04/2011
Field of study

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Probabilistic Hierarchical Clustering with Labeled and Unlabeled Data

Author: Hansen Lars Kai
Have Anna Szynkowiak
Larsen Jan
Publication venue
Publication date: 01/01/2001
Field of study

. This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications, where supervised learning is performed using both labeled and unlabeled examples. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model and is extended using a modified Expectation Maximization procedure for learning with both unlabeled and labeled examples. The proposed hierarchical scheme is agglomerative and based on probabilistic similarity measures. Here, we compare a L 2 dissimilarity measure, error confusion similarity, and accumulated posterior cluster probability measure. The unsupervised and supervised schemes are successfully tested on artificially data and for e-mails segmentation.

CiteSeerX

Online Research Database In Technology

Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms

Author: A.D. GORDON
Alberto Fernández
B.J.T. MORGAN
G. HART
G.J. SZÉKELY
G.N. LANCE
J. MACCUISH
J.H. WARD Jr.
P.H.A. SNEATH
R.M. CORMACK
Sergio Gómez
T. BACKELJAU
V. ARNAU
W.A. KLOOT VAN DER
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/06/2009
Field of study

In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using Multidendrograms available at http://deim.urv.cat/~sgomez/multidendrograms.ph

arXiv.org e-Print Archive

Crossref

Research Papers in Economics