Search CORE

103 research outputs found

Fast, Linear Time Hierarchical Clustering using the Baire Metric

Author: A Fernández-Soto
ACM Van Rooij
AK Seda
BA Davey
F Murtagh
F Murtagh
F Murtagh
F Murtagh
F Murtagh
F Murtagh
F Murtagh
Fionn Murtagh
IC Lerman
J-P Benzécri
JA Hartigan
JK Adelman-Mccarthy
MF Janowitz
MF Janowitz
P Contreras
P Hitzler
PE Bradley
PE Bradley
Pedro Contreras
R D’abrusco
SC Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/06/2011
Field of study

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.Comment: 27 pages, 6 tables, 10 figure

arXiv.org e-Print Archive

Royal Holloway Research Online

Crossref

Royal Holloway - Pure

De Montfort University Open Research Archive

Fast redshift clustering with the Baire (ultra) metric

Author: Contreras Pedro
Murtagh Fionn
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/04/2011
Field of study

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement

Author: A. K. Seda
D. Achlioptas
D. Dutta
D. Fradkin
E. Bingham
F. Murtagh
F. Murtagh
F. Murtagh
F. Murtagh
I. C. Lerman
J. Benois-Pineau
J. P. Benzécri
J.-W. Chang
M. F. Janowitz
M. L. Miller
N. H. Park
P. Contreras
P. E. Bradley
P. E. Bradley
P. Frankl
P. Grabusts
P. Hall
P. Li
P. Li
R. Buaba
R. Rammal
S. Dasgupta
S. Dasgupta
S. Deegalla
W. B. Johnson
X. Z. Fern
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 27/11/2011
Field of study

We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of data measurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.Comment: 17 pages, 45 citations, 2 figure

arXiv.org e-Print Archive

Crossref

De Montfort University Open Research Archive

Mumford dendrograms and discrete p-adic symmetries

Author: A. Yu. Khrennikov
B. Dragovich
D. Mumford
F. Kato
F. Murtagh
F. Murtagh
J. Benois-Pineau
J. Tate
L. O. Chekhov
P. E. Bradley
P. E. Bradley
P. E. Bradley
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 09/09/2008
Field of study

In this article, we present an effective encoding of dendrograms by embedding them into the Bruhat-Tits trees associated to

p

-adic number fields. As an application, we show how strings over a finite alphabet can be encoded in cyclotomic extensions of

\mathbb{Q}_p

and discuss

p

-adic DNA encoding. The application leads to fast

p

-adic agglomerative hierarchic algorithms similar to the ones recently used e.g. by A. Khrennikov and others. From the viewpoint of

p

-adic geometry, to encode a dendrogram

X

in a

p

-adic field

K

means to fix a set

S

K

-rational punctures on the

p

-adic projective line

\mathbb{P}^1

. To

\mathbb{P}^1\setminus S

is associated in a natural way a subtree inside the Bruhat-Tits tree which recovers

X

, a method first used by F. Kato in 1999 in the classification of discrete subgroups of

\textrm{PGL}_2(K)

. Next, we show how the

p

-adic moduli space

\mathfrak{M}_{0,n}

\mathbb{P}^1

with

n

punctures can be applied to the study of time series of dendrograms and those symmetries arising from hyperbolic actions on

\mathbb{P}^1

. In this way, we can associate to certain classes of dynamical systems a Mumford curve, i.e. a

p

-adic algebraic curve with totally degenerate reduction modulo

p

. Finally, we indicate some of our results in the study of general discrete actions on

\mathbb{P}^1

, and their relation to

p

-adic Hurwitz spaces.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Clustering through High Dimensional Data Scaling: Applications and Implementations

Author: Contreras Pedro
Murtagh Fionn
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 27/02/2017
Field of study

To analyse very high dimensional data, or large data volumes, we study random projection. Since hierarchically clustered data can be scaled in one dimension, seriation or unidimensional scaling is our primary objective. Having determined a unidimensional scaling of the multidimensional data cloud, this is followed by clustering. In many past case studies we carried out such clustering, using the Baire, or longest common prefix, metric and, simultaneously, ultrametric. In this paper, we examine properties of the seriation, and of the induction of the clustering on the data summarization, through seriation. Simulations are described as well as a small, illustrative example using Fisher’s iris data

KITopen

Algorithms for Hierarchical Clustering: An Overview, II

Author: Benzécri JP
Bruynooghe M
Gan G
Gordon AD
Jain AK
Juan J
Kohonen T
Le Roux B
Lerman IC
Murtagh F
Murtagh F
Rapoport A
Rham C
Rijsbergen CJ
Rohlf FJ
Sneath PHA
Vicente D
White HD
Wishart D
Publication venue: 'Wiley'
Publication date: 05/09/2017
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm. This review adds to the earlier version, Murtagh and Contreras (2012)

Crossref

University of Huddersfield Repository

Huddersfield Research Portal