7,610 research outputs found

    Clustering by compression

    Full text link
    We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure

    Universality classes for horizon instabilities

    Get PDF
    We introduce a notion of universality classes for the Gregory-Laflamme instability and determine, in the supergravity approximation, the stability of a variety of solutions, including the non-extremal D3-brane, M2-brane, and M5-brane. These three non-dilatonic branes cross over from instability to stability at a certain non-extremal mass. Numerical analysis suggests that the wavelength of the shortest unstable mode diverges as one approaches the cross-over point from above, with a simple critical exponent which is the same in all three cases.Comment: 23 pages, latex2e, 4 figure

    Universality and scaling of correlations between zeros on complex manifolds

    Full text link
    We study the limit as N→∞N\to\infty of the correlations between simultaneous zeros of random sections of the powers LNL^N of a positive holomorphic line bundle LL over a compact complex manifold MM, when distances are rescaled so that the average density of zeros is independent of NN. We show that the limit correlation is independent of the line bundle and depends only on the dimension of MM and the codimension of the zero sets. We also provide some explicit formulas for pair correlations. In particular, we provide an alternate derivation of Hannay's limit pair correlation function for SU(2) polynomials, and we show that this correlation function holds for all compact Riemann surfaces.Comment: 3 figure

    Universality of citation distributions: towards an objective measure of scientific impact

    Full text link
    We study the distributions of citations received by a single publication within several disciplines, spanning broad areas of science. We show that the probability that an article is cited cc times has large variations between different disciplines, but all distributions are rescaled on a universal curve when the relative indicator cf=c/c0c_f=c/c_0 is considered, where c0c_0 is the average number of citations per article for the discipline. In addition we show that the same universal behavior occurs when citation distributions of articles published in the same field, but in different years, are compared. These findings provide a strong validation of cfc_f as an unbiased indicator for citation performance across disciplines and years. Based on this indicator, we introduce a generalization of the h-index suitable for comparing scientists working in different fields.Comment: 7 pages, 5 figures. accepted for publication in Proc. Natl Acad. Sci. US
    • …
    corecore