733 research outputs found
A directed isoperimetric inequality with application to Bregman near neighbor lower bounds
Bregman divergences are a class of divergences parametrized by a
convex function and include well known distance functions like
and the Kullback-Leibler divergence. There has been extensive
research on algorithms for problems like clustering and near neighbor search
with respect to Bregman divergences, in all cases, the algorithms depend not
just on the data size and dimensionality , but also on a structure
constant that depends solely on and can grow without bound
independently.
In this paper, we provide the first evidence that this dependence on
might be intrinsic. We focus on the problem of approximate near neighbor search
for Bregman divergences. We show that under the cell probe model, any
non-adaptive data structure (like locality-sensitive hashing) for
-approximate near-neighbor search that admits probes must use space
. In contrast, for LSH under the best
bound is .
Our new tool is a directed variant of the standard boolean noise operator. We
show that a generalization of the Bonami-Beckner hypercontractivity inequality
exists "in expectation" or upon restriction to certain subsets of the Hamming
cube, and that this is sufficient to prove the desired isoperimetric inequality
that we use in our data structure lower bound.
We also present a structural result reducing the Hamming cube to a Bregman
cube. This structure allows us to obtain lower bounds for problems under
Bregman divergences from their analog. In particular, we get a
(weaker) lower bound for approximate near neighbor search of the form
for an -query non-adaptive data structure,
and new cell probe lower bounds for a number of other near neighbor questions
in Bregman space.Comment: 27 page
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
[See the paper for the full abstract.]
We show tight upper and lower bounds for time-space trade-offs for the
-Approximate Near Neighbor Search problem. For the -dimensional Euclidean
space and -point datasets, we develop a data structure with space and query time for
every such that: \begin{equation} c^2 \sqrt{\rho_q} +
(c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation}
This is the first data structure that achieves sublinear query time and
near-linear space for every approximation factor , improving upon
[Kapralov, PODS 2015]. The data structure is a culmination of a long line of
work on the problem for all space regimes; it builds on Spherical
Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and
data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni,
Razenshteyn, STOC 2015].
Our matching lower bounds are of two types: conditional and unconditional.
First, we prove tightness of the whole above trade-off in a restricted model of
computation, which captures all known hashing-based approaches. We then show
unconditional cell-probe lower bounds for one and two probes that match the
above trade-off for , improving upon the best known lower bounds
from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first
space lower bound (for any static data structure) for two probes which is not
polynomially smaller than the one-probe bound. To show the result for two
probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and
arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version
contains more elaborated proofs and fixed some typo
Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors
We show tight lower bounds for the entire trade-off between space and query
time for the Approximate Near Neighbor search problem. Our lower bounds hold in
a restricted model of computation, which captures all hashing-based approaches.
In articular, our lower bound matches the upper bound recently shown in
[Laarhoven 2015] for the random instance on a Euclidean sphere (which we show
in fact extends to the entire space using the techniques from
[Andoni, Razenshteyn 2015]).
We also show tight, unconditional cell-probe lower bounds for one and two
probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder
2010]. In particular, this is the first space lower bound (for any static data
structure) for two probes which is not polynomially smaller than for one probe.
To show the result for two probes, we establish and exploit a connection to
locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of
small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with
arXiv:1511.07527 [cs.DS]
Indexability, concentration, and VC theory
Degrading performance of indexing schemes for exact similarity search in high
dimensions has long since been linked to histograms of distributions of
distances and other 1-Lipschitz functions getting concentrated. We discuss this
observation in the framework of the phenomenon of concentration of measure on
the structures of high dimension and the Vapnik-Chervonenkis theory of
statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded,
improved and corrected version of the SISAP'2010 invited paper, this e-print,
v3
Probabilistic Polynomials and Hamming Nearest Neighbors
We show how to compute any symmetric Boolean function on variables over
any field (as well as the integers) with a probabilistic polynomial of degree
and error at most . The degree
dependence on and is optimal, matching a lower bound of Razborov
(1987) and Smolensky (1987) for the MAJORITY function. The proof is
constructive: a low-degree polynomial can be efficiently sampled from the
distribution.
This polynomial construction is combined with other algebraic ideas to give
the first subquadratic time algorithm for computing a (worst-case) batch of
Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let
. Suppose we are given a database
of vectors in and a collection of query vectors
in the same dimension. For all , we wish to compute a
with minimum Hamming distance from . We solve this problem in randomized time. Hence, the problem is in "truly subquadratic"
time for dimensions, and in subquadratic time for . We apply the algorithm to computing pairs with maximum
inner product, closest pair in for vectors with bounded integer
entries, and pairs with maximum Jaccard coefficients.Comment: 16 pages. To appear in 56th Annual IEEE Symposium on Foundations of
Computer Science (FOCS 2015
Simple Average-Case Lower Bounds for Approximate Near-Neighbor from Isoperimetric Inequalities
We prove an Omega(d/log(sw/nd)) lower bound for the average-case cell-probe complexity of deterministic or Las Vegas randomized algorithms solving approximate near-neighbor (ANN) problem in ddimensional Hamming space in the cell-probe model with w-bit cells, using a table of size s. This lower bound matches the highest known worst-case cell-probe lower bounds for any static data structure problems.
This average-case cell-probe lower bound is proved in a general framework which relates the cell-probe complexity of ANN to isoperimetric inequalities in the underlying metric space. A tighter connection between ANN lower bounds and isoperimetric inequalities is established by a stronger richness lemma proved by cell-sampling techniques
- …