9 research outputs found
Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time
We derandomize G. Valiant\u27s [J.ACM 62(2015) Art.13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant\u27s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders in Reingold, Vadhan, and Wigderson [Ann. of Math 155(2002), 157-187]. We say that a function f:{-1,1}^d ->{-1,1}^D is a correlation amplifier with threshold 0 = 1, and strength p an even positive integer if for all pairs of vectors x,y in {-1,1}^d it holds that (i) ||| | >= tau*d implies (/gamma^d})^p*D /d)^p*D
Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors
We show tight lower bounds for the entire trade-off between space and query
time for the Approximate Near Neighbor search problem. Our lower bounds hold in
a restricted model of computation, which captures all hashing-based approaches.
In articular, our lower bound matches the upper bound recently shown in
[Laarhoven 2015] for the random instance on a Euclidean sphere (which we show
in fact extends to the entire space using the techniques from
[Andoni, Razenshteyn 2015]).
We also show tight, unconditional cell-probe lower bounds for one and two
probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder
2010]. In particular, this is the first space lower bound (for any static data
structure) for two probes which is not polynomially smaller than for one probe.
To show the result for two probes, we establish and exploit a connection to
locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of
small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with
arXiv:1511.07527 [cs.DS]
Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time
We derandomize G. Valiant's [J.ACM 62(2015) Art.13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant's randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders in Reingold, Vadhan, and Wigderson [Ann. of Math 155(2002), 157-187]. We say that a function f:{-1,1}^d ->{-1,1}^D is a correlation amplifier with threshold 0 = 1, and strength p an even positive integer if for all pairs of vectors x,y in {-1,1}^d it holds that (i) ||| | >= tau*d implies (/gamma^d})^p*D /d)^p*D.Peer reviewe
Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time
We derandomize Valiant’s (J ACM 62, Article 13, 2015) subquadratic-time algorithm for finding outlier correlations in binary data. This demonstrates that it is possible to perform a deterministic subquadratic-time similarity join of high dimensionality. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant’s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders by Reingold et al. (Ann Math 155(1):157–187, 2002). We say that a function f: { - 1 , 1 } d→ { - 1 , 1 } D is a correlation amplifier with threshold 0 ≤ τ≤ 1 , error γ≥ 1 , and strength p an even positive integer if for all pairs of vectors x, y∈ { - 1 , 1 } d it holds that (i) | ⟨ x, y⟩ | < τd implies | ⟨ f(x) , f(y) ⟩ | ≤ (τγ) pD; and (ii) | ⟨ x, y⟩ | ≥ τd implies (⟨x,y⟩γd)pD≤⟨f(x),f(y)⟩≤(γ⟨x,y⟩d)pD.Peer reviewe
Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
[See the paper for the full abstract.]
We show tight upper and lower bounds for time-space trade-offs for the
-Approximate Near Neighbor Search problem. For the -dimensional Euclidean
space and -point datasets, we develop a data structure with space and query time for
every such that: \begin{equation} c^2 \sqrt{\rho_q} +
(c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation}
This is the first data structure that achieves sublinear query time and
near-linear space for every approximation factor , improving upon
[Kapralov, PODS 2015]. The data structure is a culmination of a long line of
work on the problem for all space regimes; it builds on Spherical
Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and
data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni,
Razenshteyn, STOC 2015].
Our matching lower bounds are of two types: conditional and unconditional.
First, we prove tightness of the whole above trade-off in a restricted model of
computation, which captures all known hashing-based approaches. We then show
unconditional cell-probe lower bounds for one and two probes that match the
above trade-off for , improving upon the best known lower bounds
from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first
space lower bound (for any static data structure) for two probes which is not
polynomially smaller than the one-probe bound. To show the result for two
probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and
arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version
contains more elaborated proofs and fixed some typo