65 research outputs found
Ultrametric embedding: application to data fingerprinting and to fast data clustering
We begin with pervasive ultrametricity due to high dimensionality and/or
spatial sparsity. How extent or degree of ultrametricity can be quantified
leads us to the discussion of varied practical cases when ultrametricity can be
partially or locally present in data. We show how the ultrametricity can be
assessed in text or document collections, and in time series signals. An aspect
of importance here is that to draw benefit from this perspective the data may
need to be recoded. Such data recoding can also be powerful in proximity
searching, as we will show, where the data is embedded globally and not locally
in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19
May 2006 versio
Fast redshift clustering with the Baire (ultra) metric
The Baire metric induces an ultrametric on a dataset and is of linear
computational complexity, contrasted with the standard quadratic time
agglomerative hierarchical clustering algorithm. We apply the Baire distance to
spectrometric and photometric redshifts from the Sloan Digital Sky Survey
using, in this work, about half a million astronomical objects. We want to know
how well the (more cos\ tly to determine) spectrometric redshifts can predict
the (more easily obtained) photometric redshifts, i.e. we seek to regress the
spectrometric on the photometric redshifts, and we develop a clusterwise
nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure
Mean field spin glasses treated with PDE techniques
Following an original idea of F. Guerra, in this notes we analyze the
Sherrington-Kirkpatrick model from different perspectives, all sharing the
underlying approach which consists in linking the resolution of the statistical
mechanics of the model (e.g. solving for the free energy) to well-known partial
differential equation (PDE) problems (in suitable spaces). The plan is then to
solve the related PDE using techniques involved in their native field and
lastly bringing back the solution in the proper statistical mechanics
framework. Within this strand, after a streamlined test-case on the Curie-Weiss
model to highlight the methods more than the physics behind, we solve the SK
both at the replica symmetric and at the 1-RSB level, obtaining the correct
expression for the free energy via an analogy to a Fourier equation and for the
self-consistencies with an analogy to a Burger equation, whose shock wave
develops exactly at critical noise level (triggering the phase transition). Our
approach, beyond acting as a new alternative method (with respect to the
standard routes) for tackling the complexity of spin glasses, links symmetries
in PDE theory with constraints in statistical mechanics and, as a novel result
from the theoretical physics perspective, we obtain a new class of polynomial
identities (namely of Aizenman-Contucci type but merged within the Guerra's
broken replica measures), whose interest lies in understanding, via the recent
Panchenko breakthroughs, how to force the overlap organization to the
ultrametric tree predicted by Parisi
Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement
We describe many vantage points on the Baire metric and its use in clustering
data, or its use in preprocessing and structuring data in order to support
search and retrieval operations. In some cases, we proceed directly to clusters
and do not directly determine the distances. We show how a hierarchical
clustering can be read directly from one pass through the data. We offer
insights also on practical implications of precision of data measurement. As a
mechanism for treating multidimensional data, including very high dimensional
data, we use random projections.Comment: 17 pages, 45 citations, 2 figure
- …