406 research outputs found
Metric Embedding into the Hamming Space with the n-Simplex Projection
Transformations of data objects into the Hamming space are often exploited to speed-up the similarity search in metric spaces. Techniques applicable in generic metric spaces require expensive learning, e.g., selection of pivoting objects. However, when searching in common Euclidean space, the best performance is usually achieved by transformations specifically designed for this space. We propose a novel transformation technique that provides a good trade-off between the applicability and the quality of the space approximation. It uses the n-Simplex projection to transform metric objects into a low-dimensional Euclidean space, and then transform this space to the Hamming space. We compare our approach theoretically and experimentally with several techniques of the metric embedding into the Hamming space. We focus on the applicability, learning cost, and the quality of search space approximation
Density of Spherically-Embedded Stiefel and Grassmann Codes
The density of a code is the fraction of the coding space covered by packing
balls centered around the codewords. This paper investigates the density of
codes in the complex Stiefel and Grassmann manifolds equipped with the chordal
distance. The choice of distance enables the treatment of the manifolds as
subspaces of Euclidean hyperspheres. In this geometry, the densest packings are
not necessarily equivalent to maximum-minimum-distance codes. Computing a
code's density follows from computing: i) the normalized volume of a metric
ball and ii) the kissing radius, the radius of the largest balls one can pack
around the codewords without overlapping. First, the normalized volume of a
metric ball is evaluated by asymptotic approximations. The volume of a small
ball can be well-approximated by the volume of a locally-equivalent tangential
ball. In order to properly normalize this approximation, the precise volumes of
the manifolds induced by their spherical embedding are computed. For larger
balls, a hyperspherical cap approximation is used, which is justified by a
volume comparison theorem showing that the normalized volume of a ball in the
Stiefel or Grassmann manifold is asymptotically equal to the normalized volume
of a ball in its embedding sphere as the dimension grows to infinity. Then,
bounds on the kissing radius are derived alongside corresponding bounds on the
density. Unlike spherical codes or codes in flat spaces, the kissing radius of
Grassmann or Stiefel codes cannot be exactly determined from its minimum
distance. It is nonetheless possible to derive bounds on density as functions
of the minimum distance. Stiefel and Grassmann codes have larger density than
their image spherical codes when dimensions tend to infinity. Finally, the
bounds on density lead to refinements of the standard Hamming bounds for
Stiefel and Grassmann codes.Comment: Two-column version (24 pages, 6 figures, 4 tables). To appear in IEEE
Transactions on Information Theor
Optimal Embeddings of Distance Regular Graphs into Euclidean Spaces
In this paper we give a lower bound for the least distortion embedding of a
distance regular graph into Euclidean space. We use the lower bound for finding
the least distortion for Hamming graphs, Johnson graphs, and all strongly
regular graphs. Our technique involves semidefinite programming and exploiting
the algebra structure of the optimization problem so that the question of
finding a lower bound of the least distortion is reduced to an analytic
question about orthogonal polynomials.Comment: 10 pages, (v3) some corrections, accepted in Journal of Combinatorial
Theory, Series
Sphere packing bounds in the Grassmann and Stiefel manifolds
Applying the Riemann geometric machinery of volume estimates in terms of
curvature, bounds for the minimal distance of packings/codes in the Grassmann
and Stiefel manifolds will be derived and analyzed. In the context of
space-time block codes this leads to a monotonically increasing minimal
distance lower bound as a function of the block length. This advocates large
block lengths for the code design.Comment: Replaced with final version, 11 page
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
- …