2,442 research outputs found

    Marked Ancestor Problems (Preliminary Version)

    Get PDF
    Consider a rooted tree whose nodes can be marked or unmarked. Given a node, we want to find its nearest marked ancestor. This generalises the well-known predecessor problem, where the tree is a path. We show tight upper and lower bounds for this problem. The lower bounds are proved in the cell probe model, the upper bounds run on a unit-cost RAM. As easy corollaries we prove (often optimal) lower bounds on a number of problems. These include planar range searching, including the existential or emptiness problem, priority search trees, static tree union-find, and several problems from dynamic computational geometry, including intersection problems, proximity problems, and ray shooting. Our upper bounds improve a number of algorithms from various fields, including dynamic dictionary matching and coloured ancestor problems

    Dynamic Integer Sets with Optimal Rank, Select, and Predecessor Search

    Full text link
    We present a data structure representing a dynamic set S of w-bit integers on a w-bit word RAM. With |S|=n and w > log n and space O(n), we support the following standard operations in O(log n / log w) time: - insert(x) sets S = S + {x}. - delete(x) sets S = S - {x}. - predecessor(x) returns max{y in S | y= x}. - rank(x) returns #{y in S | y< x}. - select(i) returns y in S with rank(y)=i, if any. Our O(log n/log w) bound is optimal for dynamic rank and select, matching a lower bound of Fredman and Saks [STOC'89]. When the word length is large, our time bound is also optimal for dynamic predecessor, matching a static lower bound of Beame and Fich [STOC'99] whenever log n/log w=O(log w/loglog w). Technically, the most interesting aspect of our data structure is that it supports all the above operations in constant time for sets of size n=w^{O(1)}. This resolves a main open problem of Ajtai, Komlos, and Fredman [FOCS'83]. Ajtai et al. presented such a data structure in Yao's abstract cell-probe model with w-bit cells/words, but pointed out that the functions used could not be implemented. As a partial solution to the problem, Fredman and Willard [STOC'90] introduced a fusion node that could handle queries in constant time, but used polynomial time on the updates. We call our small set data structure a dynamic fusion node as it does both queries and updates in constant time.Comment: Presented with different formatting in Proceedings of the 55nd IEEE Symposium on Foundations of Computer Science (FOCS), 2014, pp. 166--175. The new version fixes a bug in one of the bounds stated for predecessor search, pointed out to me by Djamal Belazzougu

    Towards Tight Lower Bounds for Range Reporting on the RAM

    Full text link
    In the orthogonal range reporting problem, we are to preprocess a set of nn points with integer coordinates on a U×UU \times U grid. The goal is to support reporting all kk points inside an axis-aligned query rectangle. This is one of the most fundamental data structure problems in databases and computational geometry. Despite the importance of the problem its complexity remains unresolved in the word-RAM. On the upper bound side, three best tradeoffs exists: (1.) Query time O(lglgn+k)O(\lg \lg n + k) with O(nlgεn)O(nlg^{\varepsilon}n) words of space for any constant ε>0\varepsilon>0. (2.) Query time O((1+k)lglgn)O((1 + k) \lg \lg n) with O(nlglgn)O(n \lg \lg n) words of space. (3.) Query time O((1+k)lgεn)O((1+k)\lg^{\varepsilon} n) with optimal O(n)O(n) words of space. However, the only known query time lower bound is Ω(loglogn+k)\Omega(\log \log n +k), even for linear space data structures. All three current best upper bound tradeoffs are derived by reducing range reporting to a ball-inheritance problem. Ball-inheritance is a problem that essentially encapsulates all previous attempts at solving range reporting in the word-RAM. In this paper we make progress towards closing the gap between the upper and lower bounds for range reporting by proving cell probe lower bounds for ball-inheritance. Our lower bounds are tight for a large range of parameters, excluding any further progress for range reporting using the ball-inheritance reduction

    Linear-Space Data Structures for Range Mode Query in Arrays

    Full text link
    A mode of a multiset SS is an element aSa \in S of maximum multiplicity; that is, aa occurs at least as frequently as any other element in SS. Given a list A[1:n]A[1:n] of nn items, we consider the problem of constructing a data structure that efficiently answers range mode queries on AA. Each query consists of an input pair of indices (i,j)(i, j) for which a mode of A[i:j]A[i:j] must be returned. We present an O(n22ϵ)O(n^{2-2\epsilon})-space static data structure that supports range mode queries in O(nϵ)O(n^\epsilon) time in the worst case, for any fixed ϵ[0,1/2]\epsilon \in [0,1/2]. When ϵ=1/2\epsilon = 1/2, this corresponds to the first linear-space data structure to guarantee O(n)O(\sqrt{n}) query time. We then describe three additional linear-space data structures that provide O(k)O(k), O(m)O(m), and O(ji)O(|j-i|) query time, respectively, where kk denotes the number of distinct elements in AA and mm denotes the frequency of the mode of AA. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure

    Cell-Probe Bounds for Online Edit Distance and Other Pattern Matching Problems

    Full text link
    We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of nn symbols is given and one δ\delta-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is perhaps the strongest model of computation for showing data structure lower bounds, subsuming in particular the popular word-RAM model. * We first give an Ω((δlogn)/(w+loglogn))\Omega((\delta \log n)/(w+\log\log n)) lower bound for the time to give each output for both online Hamming distance and convolution, where ww is the word size. This bound relies on a new encoding scheme and for the first time holds even when ww is as small as a single bit. * We then consider the online edit distance and longest common subsequence problems in the bit-probe model (w=1w=1) with a constant sized input alphabet. We give a lower bound of Ω(logn/(loglogn)3/2)\Omega(\sqrt{\log n}/(\log\log n)^{3/2}) which applies for both problems. This second set of results relies both on our new encoding scheme as well as a carefully constructed hard distribution. * Finally, for the online edit distance problem we show that there is an O((logn)2/w)O((\log n)^2/w) upper bound in the cell-probe model. This bound gives a contrast to our new lower bound and also establishes an exponential gap between the known cell-probe and RAM model complexities.Comment: 32 pages, 4 figure

    Random input helps searching predecessors

    Get PDF
    A data structure problem consists of the finite sets: D of data, Q of queries, A of query answers, associated with a function f: D x Q → A. The data structure of file X is "static" ("dynamic") if we "do not" ("do") require quick updates as X changes. An important goal is to compactly encode a file X ϵ D, such that for each query y ϵ Q, function f (X, y) requires the minimum time to compute an answer in A. This goal is trivial if the size of D is large, since for each query y ϵ Q, it was shown that f(X,y) requires O(1) time for the most important queries in the literature. Hence, this goal becomes interesting to study as a trade off between the "storage space" and the "query time", both measured as functions of the file size n = \X\. The ideal solution would be to use linear O(n) = O(\X\) space, while retaining a constant O(1) query time. However, if f (X, y) computes the static predecessor search (find largest x ϵ X: x ≤ y), then Ajtai [Ajt88] proved a negative result. By using just n0(1) = [IX]0(1) data space, then it is not possible to evaluate f(X,y) in O(1) time Ay ϵ Q. The proof exhibited a bad distribution of data D, such that Ey∗ ϵ Q (a "difficult" query y∗), that f(X,y∗) requires ω(1) time. Essentially [Ajt88] is an existential result, resolving the worst case scenario. But, [Ajt88] left open the question: do we typically, that is, with high probability (w.h.p.)1 encounter such "difficult" queries y ϵ Q, when assuming reasonable distributions with respect to (w.r.t.) queries and data? Below we make reasonable assumptions w.r.t. the distribution of the queries y ϵ Q, as well as w.r.t. the distribution of data X ϵ D. In two interesting scenarios studied in the literature, we resolve the typical (w.h.p.) query time
    corecore