Search CORE

65 research outputs found

Ultrametric embedding: application to data fingerprinting and to fast data clustering

Author: Murtagh Fionn
Publication venue
Publication date: 28/01/2007
Field of study

We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance here is that to draw benefit from this perspective the data may need to be recoded. Such data recoding can also be powerful in proximity searching, as we will show, where the data is embedded globally and not locally in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 versio

arXiv.org e-Print Archive

CiteSeerX

Royal Holloway Research Online

Crossref

Royal Holloway - Pure

De Montfort University Open Research Archive

Fast redshift clustering with the Baire (ultra) metric

Author: Contreras Pedro
Murtagh Fionn
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/04/2011
Field of study

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Mean field spin glasses treated with PDE techniques

Author: Barra Adriano
Del Ferraro Gino
Tantari Daniele
Publication venue
Publication date: 01/01/2013
Field of study

Following an original idea of F. Guerra, in this notes we analyze the Sherrington-Kirkpatrick model from different perspectives, all sharing the underlying approach which consists in linking the resolution of the statistical mechanics of the model (e.g. solving for the free energy) to well-known partial differential equation (PDE) problems (in suitable spaces). The plan is then to solve the related PDE using techniques involved in their native field and lastly bringing back the solution in the proper statistical mechanics framework. Within this strand, after a streamlined test-case on the Curie-Weiss model to highlight the methods more than the physics behind, we solve the SK both at the replica symmetric and at the 1-RSB level, obtaining the correct expression for the free energy via an analogy to a Fourier equation and for the self-consistencies with an analogy to a Burger equation, whose shock wave develops exactly at critical noise level (triggering the phase transition). Our approach, beyond acting as a new alternative method (with respect to the standard routes) for tackling the complexity of spin glasses, links symmetries in PDE theory with constraints in statistical mechanics and, as a novel result from the theoretical physics perspective, we obtain a new class of polynomial identities (namely of Aizenman-Contucci type but merged within the Guerra's broken replica measures), whose interest lies in understanding, via the recent Panchenko breakthroughs, how to force the overlap organization to the ultrametric tree predicted by Parisi

arXiv.org e-Print Archive

Publikationer från KTH

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Archivio della ricerca- Università di Roma La Sapienza

Archivio Istituzionale della Ricerca- Università del Salento

Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement

Author: A. K. Seda
D. Achlioptas
D. Dutta
D. Fradkin
E. Bingham
F. Murtagh
F. Murtagh
F. Murtagh
F. Murtagh
I. C. Lerman
J. Benois-Pineau
J. P. Benzécri
J.-W. Chang
M. F. Janowitz
M. L. Miller
N. H. Park
P. Contreras
P. E. Bradley
P. E. Bradley
P. Frankl
P. Grabusts
P. Hall
P. Li
P. Li
R. Buaba
R. Rammal
S. Dasgupta
S. Dasgupta
S. Deegalla
W. B. Johnson
X. Z. Fern
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 27/11/2011
Field of study

We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of data measurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.Comment: 17 pages, 45 citations, 2 figure

arXiv.org e-Print Archive

Crossref

De Montfort University Open Research Archive