65 research outputs found

    Ultrametric embedding: application to data fingerprinting and to fast data clustering

    Get PDF
    We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance here is that to draw benefit from this perspective the data may need to be recoded. Such data recoding can also be powerful in proximity searching, as we will show, where the data is embedded globally and not locally in an ultrametric space.Comment: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 versio

    Fast redshift clustering with the Baire (ultra) metric

    Full text link
    The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.Comment: 14 pages, 6 figure

    Mean field spin glasses treated with PDE techniques

    Full text link
    Following an original idea of F. Guerra, in this notes we analyze the Sherrington-Kirkpatrick model from different perspectives, all sharing the underlying approach which consists in linking the resolution of the statistical mechanics of the model (e.g. solving for the free energy) to well-known partial differential equation (PDE) problems (in suitable spaces). The plan is then to solve the related PDE using techniques involved in their native field and lastly bringing back the solution in the proper statistical mechanics framework. Within this strand, after a streamlined test-case on the Curie-Weiss model to highlight the methods more than the physics behind, we solve the SK both at the replica symmetric and at the 1-RSB level, obtaining the correct expression for the free energy via an analogy to a Fourier equation and for the self-consistencies with an analogy to a Burger equation, whose shock wave develops exactly at critical noise level (triggering the phase transition). Our approach, beyond acting as a new alternative method (with respect to the standard routes) for tackling the complexity of spin glasses, links symmetries in PDE theory with constraints in statistical mechanics and, as a novel result from the theoretical physics perspective, we obtain a new class of polynomial identities (namely of Aizenman-Contucci type but merged within the Guerra's broken replica measures), whose interest lies in understanding, via the recent Panchenko breakthroughs, how to force the overlap organization to the ultrametric tree predicted by Parisi

    Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement

    Full text link
    We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of data measurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.Comment: 17 pages, 45 citations, 2 figure
    • …
    corecore