6,708 research outputs found
Trees and Markov convexity
We show that an infinite weighted tree admits a bi-Lipschitz embedding into
Hilbert space if and only if it does not contain arbitrarily large complete
binary trees with uniformly bounded distortion. We also introduce a new metric
invariant called Markov convexity, and show how it can be used to compute the
Euclidean distortion of any metric tree up to universal factors
Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition
We provide efficient constant factor approximation algorithms for the
problems of finding a hierarchical clustering of a point set in any metric
space, minimizing the sum of minimimum spanning tree lengths within each
cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of
cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can
also be used to provide a pants decomposition, that is, a set of disjoint
simple closed curves partitioning the plane minus the input points into subsets
with exactly three boundary components, with approximately minimum total
length. In the Euclidean case, these curves are squares; in the hyperbolic
case, they combine our Euclidean square pants decomposition with our tree
clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now
Lemma 5.2, as the previous proof was erroneou
Scalable Routing Easy as PIE: a Practical Isometric Embedding Protocol (Technical Report)
We present PIE, a scalable routing scheme that achieves 100% packet delivery
and low path stretch. It is easy to implement in a distributed fashion and
works well when costs are associated to links. Scalability is achieved by using
virtual coordinates in a space of concise dimensionality, which enables greedy
routing based only on local knowledge. PIE is a general routing scheme, meaning
that it works on any graph. We focus however on the Internet, where routing
scalability is an urgent concern. We show analytically and by using simulation
that the scheme scales extremely well on Internet-like graphs. In addition, its
geometric nature allows it to react efficiently to topological changes or
failures by finding new paths in the network at no cost, yielding better
delivery ratios than standard algorithms. The proposed routing scheme needs an
amount of memory polylogarithmic in the size of the network and requires only
local communication between the nodes. Although each node constructs its
coordinates and routes packets locally, the path stretch remains extremely low,
even lower than for centralized or less scalable state-of-the-art algorithms:
PIE always finds short paths and often enough finds the shortest paths.Comment: This work has been previously published in IEEE ICNP'11. The present
document contains an additional optional mechanism, presented in Section
III-D, to further improve performance by using route asymmetry. It also
contains new simulation result
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
Indexability, concentration, and VC theory
Degrading performance of indexing schemes for exact similarity search in high
dimensions has long since been linked to histograms of distributions of
distances and other 1-Lipschitz functions getting concentrated. We discuss this
observation in the framework of the phenomenon of concentration of measure on
the structures of high dimension and the Vapnik-Chervonenkis theory of
statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded,
improved and corrected version of the SISAP'2010 invited paper, this e-print,
v3
- …