12,980 research outputs found
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
Fast redshift clustering with the Baire (ultra) metric
The Baire metric induces an ultrametric on a dataset and is of linear
computational complexity, contrasted with the standard quadratic time
agglomerative hierarchical clustering algorithm. We apply the Baire distance to
spectrometric and photometric redshifts from the Sloan Digital Sky Survey
using, in this work, about half a million astronomical objects. We want to know
how well the (more cos\ tly to determine) spectrometric redshifts can predict
the (more easily obtained) photometric redshifts, i.e. we seek to regress the
spectrometric on the photometric redshifts, and we develop a clusterwise
nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure
Probabilistic Hierarchical Clustering with Labeled and Unlabeled Data
. This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications, where supervised learning is performed using both labeled and unlabeled examples. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model and is extended using a modified Expectation Maximization procedure for learning with both unlabeled and labeled examples. The proposed hierarchical scheme is agglomerative and based on probabilistic similarity measures. Here, we compare a L 2 dissimilarity measure, error confusion similarity, and accumulated posterior cluster probability measure. The unsupervised and supervised schemes are successfully tested on artificially data and for e-mails segmentation.
Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms
In agglomerative hierarchical clustering, pair-group methods suffer from a
problem of non-uniqueness when two or more distances between different clusters
coincide during the amalgamation process. The traditional approach for solving
this drawback has been to take any arbitrary criterion in order to break ties
between distances, which results in different hierarchical classifications
depending on the criterion followed. In this article we propose a
variable-group algorithm that consists in grouping more than two clusters at
the same time when ties occur. We give a tree representation for the results of
the algorithm, which we call a multidendrogram, as well as a generalization of
the Lance and Williams' formula which enables the implementation of the
algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using
Multidendrograms available at
http://deim.urv.cat/~sgomez/multidendrograms.ph
- …