9,882 research outputs found
Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors
We show tight lower bounds for the entire trade-off between space and query
time for the Approximate Near Neighbor search problem. Our lower bounds hold in
a restricted model of computation, which captures all hashing-based approaches.
In articular, our lower bound matches the upper bound recently shown in
[Laarhoven 2015] for the random instance on a Euclidean sphere (which we show
in fact extends to the entire space using the techniques from
[Andoni, Razenshteyn 2015]).
We also show tight, unconditional cell-probe lower bounds for one and two
probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder
2010]. In particular, this is the first space lower bound (for any static data
structure) for two probes which is not polynomially smaller than for one probe.
To show the result for two probes, we establish and exploit a connection to
locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of
small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with
arXiv:1511.07527 [cs.DS]
Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
[See the paper for the full abstract.]
We show tight upper and lower bounds for time-space trade-offs for the
-Approximate Near Neighbor Search problem. For the -dimensional Euclidean
space and -point datasets, we develop a data structure with space and query time for
every such that: \begin{equation} c^2 \sqrt{\rho_q} +
(c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation}
This is the first data structure that achieves sublinear query time and
near-linear space for every approximation factor , improving upon
[Kapralov, PODS 2015]. The data structure is a culmination of a long line of
work on the problem for all space regimes; it builds on Spherical
Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and
data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni,
Razenshteyn, STOC 2015].
Our matching lower bounds are of two types: conditional and unconditional.
First, we prove tightness of the whole above trade-off in a restricted model of
computation, which captures all known hashing-based approaches. We then show
unconditional cell-probe lower bounds for one and two probes that match the
above trade-off for , improving upon the best known lower bounds
from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first
space lower bound (for any static data structure) for two probes which is not
polynomially smaller than the one-probe bound. To show the result for two
probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and
arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version
contains more elaborated proofs and fixed some typo
Finding the Median (Obliviously) with Bounded Space
We prove that any oblivious algorithm using space to find the median of a
list of integers from requires time . This bound also applies to the problem of determining whether the median
is odd or even. It is nearly optimal since Chan, following Munro and Raman, has
shown that there is a (randomized) selection algorithm using only
registers, each of which can store an input value or -bit counter,
that makes only passes over the input. The bound also implies
a size lower bound for read-once branching programs computing the low order bit
of the median and implies the analog of for length oblivious branching programs
Convex Hull of Points Lying on Lines in o(n log n) Time after Preprocessing
Motivated by the desire to cope with data imprecision, we study methods for
taking advantage of preliminary information about point sets in order to speed
up the computation of certain structures associated with them.
In particular, we study the following problem: given a set L of n lines in
the plane, we wish to preprocess L such that later, upon receiving a set P of n
points, each of which lies on a distinct line of L, we can construct the convex
hull of P efficiently. We show that in quadratic time and space it is possible
to construct a data structure on L that enables us to compute the convex hull
of any such point set P in O(n alpha(n) log* n) expected time. If we further
assume that the points are "oblivious" with respect to the data structure, the
running time improves to O(n alpha(n)). The analysis applies almost verbatim
when L is a set of line-segments, and yields similar asymptotic bounds. We
present several extensions, including a trade-off between space and query time
and an output-sensitive algorithm. We also study the "dual problem" where we
show how to efficiently compute the (<= k)-level of n lines in the plane, each
of which lies on a distinct point (given in advance).
We complement our results by Omega(n log n) lower bounds under the algebraic
computation tree model for several related problems, including sorting a set of
points (according to, say, their x-order), each of which lies on a given line
known in advance. Therefore, the convex hull problem under our setting is
easier than sorting, contrary to the "standard" convex hull and sorting
problems, in which the two problems require Theta(n log n) steps in the worst
case (under the algebraic computation tree model).Comment: 26 pages, 5 figures, 1 appendix; a preliminary version appeared at
SoCG 201
Data Structure Lower Bounds for Document Indexing Problems
We study data structure problems related to document indexing and pattern
matching queries and our main contribution is to show that the pointer machine
model of computation can be extremely useful in proving high and unconditional
lower bounds that cannot be obtained in any other known model of computation
with the current techniques. Often our lower bounds match the known space-query
time trade-off curve and in fact for all the problems considered, there is a
very good and reasonable match between the our lower bounds and the known upper
bounds, at least for some choice of input parameters. The problems that we
consider are set intersection queries (both the reporting variant and the
semi-group counting variant), indexing a set of documents for two-pattern
queries, or forbidden- pattern queries, or queries with wild-cards, and
indexing an input set of gapped-patterns (or two-patterns) to find those
matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016,
25 page
Deterministic Time-Space Tradeoffs for k-SUM
Given a set of numbers, the -SUM problem asks for a subset of numbers
that sums to zero. When the numbers are integers, the time and space complexity
of -SUM is generally studied in the word-RAM model; when the numbers are
reals, the complexity is studied in the real-RAM model, and space is measured
by the number of reals held in memory at any point.
We present a time and space efficient deterministic self-reduction for the
-SUM problem which holds for both models, and has many interesting
consequences. To illustrate:
* -SUM is in deterministic time and space
. In general, any
polylogarithmic-time improvement over quadratic time for -SUM can be
converted into an algorithm with an identical time improvement but low space
complexity as well. * -SUM is in deterministic time and space
, derandomizing an algorithm of Wang.
* A popular conjecture states that 3-SUM requires time on the
word-RAM. We show that the 3-SUM Conjecture is in fact equivalent to the
(seemingly weaker) conjecture that every -space algorithm for
-SUM requires at least time on the word-RAM.
* For , -SUM is in deterministic time and
space
Element Distinctness, Frequency Moments, and Sliding Windows
We derive new time-space tradeoff lower bounds and algorithms for exactly
computing statistics of input data, including frequency moments, element
distinctness, and order statistics, that are simple to calculate for sorted
data. We develop a randomized algorithm for the element distinctness problem
whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than
previous lower bounds for comparison-based algorithms, showing that element
distinctness is strictly easier than sorting for randomized branching programs.
This algorithm is based on a new time and space efficient algorithm for finding
all collisions of a function f from a finite set to itself that are reachable
by iterating f from a given set of starting points. We further show that our
element distinctness algorithm can be extended at only a polylogarithmic factor
cost to solve the element distinctness problem over sliding windows, where the
task is to take an input of length 2n-1 and produce an output for each window
of length n, giving n outputs in total. In contrast, we show a time-space
tradeoff lower bound of T in Omega(n^2/S) for randomized branching programs to
compute the number of distinct elements over sliding windows. The same lower
bound holds for computing the low-order bit of F_0 and computing any frequency
moment F_k, k neq 1. This shows that those frequency moments and the decision
problem F_0 mod 2 are strictly harder than element distinctness. We complement
this lower bound with a T in O(n^2/S) comparison-based deterministic RAM
algorithm for exactly computing F_k over sliding windows, nearly matching both
our lower bound for the sliding-window version and the comparison-based lower
bounds for the single-window version. We further exhibit a quantum algorithm
for F_0 over sliding windows with T in O(n^{3/2}/S^{1/2}). Finally, we consider
the computations of order statistics over sliding windows.Comment: arXiv admin note: substantial text overlap with arXiv:1212.437
- …