99,478 research outputs found

    Independent Range Sampling, Revisited Again

    Get PDF
    We revisit the range sampling problem: the input is a set of points where each point is associated with a real-valued weight. The goal is to store them in a structure such that given a query range and an integer k, we can extract k independent random samples from the points inside the query range, where the probability of sampling a point is proportional to its weight. This line of work was initiated in 2014 by Hu, Qiao, and Tao and it was later followed up by Afshani and Wei. The first line of work mostly studied unweighted but dynamic version of the problem in one dimension whereas the second result considered the static weighted problem in one dimension as well as the unweighted problem in 3D for halfspace queries. We offer three main results and some interesting insights that were missed by the previous work: We show that it is possible to build efficient data structures for range sampling queries if we allow the query time to hold in expectation (the first result), or obtain efficient worst-case query bounds by allowing the sampling probability to be approximately proportional to the weight (the second result). The third result is a conditional lower bound that shows essentially one of the previous two concessions is needed. For instance, for the 3D range sampling queries, the first two results give efficient data structures with near-linear space and polylogarithmic query time whereas the lower bound shows with near-linear space the worst-case query time must be close to n^{2/3}, ignoring polylogarithmic factors. Up to our knowledge, this is the first such major gap between the expected and worst-case query time of a range searching problem

    Linear-Space Data Structures for Range Mode Query in Arrays

    Full text link
    A mode of a multiset SS is an element aSa \in S of maximum multiplicity; that is, aa occurs at least as frequently as any other element in SS. Given a list A[1:n]A[1:n] of nn items, we consider the problem of constructing a data structure that efficiently answers range mode queries on AA. Each query consists of an input pair of indices (i,j)(i, j) for which a mode of A[i:j]A[i:j] must be returned. We present an O(n22ϵ)O(n^{2-2\epsilon})-space static data structure that supports range mode queries in O(nϵ)O(n^\epsilon) time in the worst case, for any fixed ϵ[0,1/2]\epsilon \in [0,1/2]. When ϵ=1/2\epsilon = 1/2, this corresponds to the first linear-space data structure to guarantee O(n)O(\sqrt{n}) query time. We then describe three additional linear-space data structures that provide O(k)O(k), O(m)O(m), and O(ji)O(|j-i|) query time, respectively, where kk denotes the number of distinct elements in AA and mm denotes the frequency of the mode of AA. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure

    The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data

    Full text link
    We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R^2) or the skip octree (for point data in R^d, with constant d>2). Our data structure combines the best features of two well-known data structures, in that it has the well-defined "box"-shaped regions of region quadtrees and the logarithmic-height search and update hierarchical structure of skip lists. Indeed, the bottom level of our structure is exactly a region quadtree (or octree for higher dimensional data). We describe efficient algorithms for inserting and deleting points in a skip quadtree, as well as fast methods for performing point location and approximate range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30

    Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

    Full text link
    This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time O(logn)O(\log n) per input symbol (as opposed to amortized O(logn)O(\log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(logn)O(\log n) worst case time per input symbol. Searching for a pattern of length mm in the resulting suffix tree takes O(min(mlogΣ,m+logn)+tocc)O(\min(m\log |\Sigma|, m + \log n) + tocc) time, where tocctocc is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

    Dynamic Ordered Sets with Exponential Search Trees

    Full text link
    We introduce exponential search trees as a novel technique for converting static polynomial space search structures for ordered sets into fully-dynamic linear space data structures. This leads to an optimal bound of O(sqrt(log n/loglog n)) for searching and updating a dynamic set of n integer keys in linear space. Here searching an integer y means finding the maximum key in the set which is smaller than or equal to y. This problem is equivalent to the standard text book problem of maintaining an ordered set (see, e.g., Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms, 2nd ed., MIT Press, 2001). The best previous deterministic linear space bound was O(log n/loglog n) due Fredman and Willard from STOC 1990. No better deterministic search bound was known using polynomial space. We also get the following worst-case linear space trade-offs between the number n, the word length w, and the maximal key U < 2^w: O(min{loglog n+log n/log w, (loglog n)(loglog U)/(logloglog U)}). These trade-offs are, however, not likely to be optimal. Our results are generalized to finger searching and string searching, providing optimal results for both in terms of n.Comment: Revision corrects some typoes and state things better for applications in subsequent paper

    I/O-Efficient Dynamic Planar Range Skyline Queries

    Get PDF
    We present the first fully dynamic worst case I/O-efficient data structures that support planar orthogonal \textit{3-sided range skyline reporting queries} in \bigO (\log_{2B^\epsilon} n + \frac{t}{B^{1-\epsilon}}) I/Os and updates in \bigO (\log_{2B^\epsilon} n) I/Os, using \bigO (\frac{n}{B^{1-\epsilon}}) blocks of space, for nn input planar points, tt reported points, and parameter 0ϵ10 \leq \epsilon \leq 1. We obtain the result by extending Sundar's priority queues with attrition to support the operations \textsc{DeleteMin} and \textsc{CatenateAndAttrite} in \bigO (1) worst case I/Os, and in \bigO(1/B) amortized I/Os given that a constant number of blocks is already loaded in main memory. Finally, we show that any pointer-based static data structure that supports \textit{dominated maxima reporting queries}, namely the difficult special case of 4-sided skyline queries, in \bigO(\log^{\bigO(1)}n +t) worst case time must occupy Ω(nlognloglogn)\Omega(n \frac{\log n}{\log \log n}) space, by adapting a similar lower bounding argument for planar 4-sided range reporting queries.Comment: Submitted to SODA 201

    Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds

    Full text link
    This paper proves the first super-logarithmic lower bounds on the cell probe complexity of dynamic boolean (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new method for proving dynamic cell probe lower bounds and use it to prove a Ω~(log1.5n)\tilde{\Omega}(\log^{1.5} n) lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting over F2\mathbb{F}_2 ([Pat07]). Proving an ω(lgn)\omega(\lg n) lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai P\v{a}tra\c{s}cu's obituary [Tho13]. This result also implies the first ω(lgn)\omega(\lg n) lower bound for the classical 2D range counting problem, one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D rectangle stabbing, and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient one-way communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10]
    corecore