7,610 research outputs found
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
Universality classes for horizon instabilities
We introduce a notion of universality classes for the Gregory-Laflamme
instability and determine, in the supergravity approximation, the stability of
a variety of solutions, including the non-extremal D3-brane, M2-brane, and
M5-brane. These three non-dilatonic branes cross over from instability to
stability at a certain non-extremal mass. Numerical analysis suggests that the
wavelength of the shortest unstable mode diverges as one approaches the
cross-over point from above, with a simple critical exponent which is the same
in all three cases.Comment: 23 pages, latex2e, 4 figure
Universality and scaling of correlations between zeros on complex manifolds
We study the limit as of the correlations between simultaneous
zeros of random sections of the powers of a positive holomorphic line
bundle over a compact complex manifold , when distances are rescaled so
that the average density of zeros is independent of . We show that the limit
correlation is independent of the line bundle and depends only on the dimension
of and the codimension of the zero sets. We also provide some explicit
formulas for pair correlations. In particular, we provide an alternate
derivation of Hannay's limit pair correlation function for SU(2) polynomials,
and we show that this correlation function holds for all compact Riemann
surfaces.Comment: 3 figure
Universality of citation distributions: towards an objective measure of scientific impact
We study the distributions of citations received by a single publication
within several disciplines, spanning broad areas of science. We show that the
probability that an article is cited times has large variations between
different disciplines, but all distributions are rescaled on a universal curve
when the relative indicator is considered, where is the
average number of citations per article for the discipline. In addition we show
that the same universal behavior occurs when citation distributions of articles
published in the same field, but in different years, are compared. These
findings provide a strong validation of as an unbiased indicator for
citation performance across disciplines and years. Based on this indicator, we
introduce a generalization of the h-index suitable for comparing scientists
working in different fields.Comment: 7 pages, 5 figures. accepted for publication in Proc. Natl Acad. Sci.
US
- …