4 research outputs found

    Dynamic Range Majority Data Structures

    Full text link
    Given a set PP of coloured points on the real line, we study the problem of answering range α\alpha-majority (or "heavy hitter") queries on PP. More specifically, for a query range QQ, we want to return each colour that is assigned to more than an α\alpha-fraction of the points contained in QQ. We present a new data structure for answering range α\alpha-majority queries on a dynamic set of points, where α(0,1)\alpha \in (0,1). Our data structure uses O(n) space, supports queries in O((lgn)/α)O((\lg n) / \alpha) time, and updates in O((lgn)/α)O((\lg n) / \alpha) amortized time. If the coordinates of the points are integers, then the query time can be improved to O(lgn/(αlglgn)+(lg(1/α))/α))O(\lg n / (\alpha \lg \lg n) + (\lg(1/\alpha))/\alpha)). For constant values of α\alpha, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d-dimensions, for d2d \ge 2, as well as dynamic arrays, in which each entry is a colour.Comment: 16 pages, Preliminary version appeared in ISAAC 201

    Algorithms and Data Structures for Geometric Intersection Query Problems

    Get PDF
    University of Minnesota Ph.D. dissertation. September 2017. Major: Computer Science. Advisor: Ravi Janardan. 1 computer file (PDF); xi, 126 pages.The focus of this thesis is the topic of geometric intersection queries (GIQ) which has been very well studied by the computational geometry community and the database community. In a GIQ problem, the user is not interested in the entire input geometric dataset, but only in a small subset of it and requests an informative summary of that small subset of data. Formally, the goal is to preprocess a set A of n geometric objects into a data structure so that given a query geometric object q, a certain aggregation function can be applied efficiently on the objects of A intersecting q. The classical aggregation functions studied in the literature are reporting or counting the objects of A intersecting q. In many applications, the same set A is queried several times, in which case one would like to answer a query faster by preprocessing A into a data structure. The goal is to organize the data into a data structure which occupies a small amount of space and yet responds to any user query in real-time. In this thesis the study of the GIQ problems was conducted from the point-of-view of a computational geometry researcher. Given a model of computation and a GIQ problem, what are the best possible upper bounds (resp., lower bounds) on the space and the query time that can be achieved by a data structure? Also, what is the relative hardness of various GIQ problems and aggregate functions. Here relative hardness means that given two GIQ problems A and B (or, two aggregate functions f(A, q) and g(A, q)), which of them can be answered faster by a computer (assuming data structures for both of them occupy asymptotically the same amount of space)? This thesis presents results which increase our understanding of the above questions. For many GIQ problems, data structures with optimal (or near-optimal) space and query time bounds have been achieved. The geometric settings studied are primarily orthogonal range searching where the input is points and the query is an axes-aligned rectangle, and the dual setting of rectangle stabbing where the input is a set of axes-aligned rectangles and the query is a point. The aggregation functions studied are primarily reporting, top-k, and approximate counting. Most of the data structures are built for the internal memory model (word-RAM or pointer machine model), but in some settings they are generic enough to be efficient in the I/O-model as well

    Space-Efficient Data Structures in the Word-RAM and Bitprobe Models

    Get PDF
    This thesis studies data structures in the word-RAM and bitprobe models, with an emphasis on space efficiency. In the word-RAM model of computation the space cost of a data structure is measured in terms of the number of w-bit words stored in memory, and the cost of answering a query is measured in terms of the number of read, write, and arithmetic operations that must be performed. In the bitprobe model, like the word-RAM model, the space cost is measured in terms of the number of bits stored in memory, but the query cost is measured solely in terms of the number of bit accesses, or probes, that are performed. First, we examine the problem of succinctly representing a partially ordered set, or poset, in the word-RAM model with word size Theta(lg n) bits. A succinct representation of a combinatorial object is one that occupies space matching the information theoretic lower bound to within lower order terms. We show how to represent a poset on n vertices using a data structure that occupies n^2/4 + o(n^2) bits, and can answer precedence (i.e., less-than) queries in constant time. Since the transitive closure of a directed acyclic graph is a poset, this implies that we can support reachability queries on an arbitrary directed graph in the same space bound. As far as we are aware, this is the first representation of an arbitrary directed graph that supports reachability queries in constant time, and stores less than n choose 2 bits. We also consider several additional query operations. Second, we examine the problem of supporting range queries on strings of n characters (or, equivalently, arrays of n elements) in the word-RAM model with word size Theta(lg n) bits. We focus on the specific problem of answering range majority queries: i.e., given a range, report the character that is the majority among those in the range, if one exists. We show that these queries can be supported in constant time using a linear space (in words) data structure. We generalize this result in several directions, considering various frequency thresholds, geometric variants of the problem, and dynamism. These results are in stark contrast to recent work on the similar range mode problem, in which the query operation asks for the mode (i.e., most frequent) character in a given range. The current best data structures for the range mode problem take soft-Oh(n^(1/2)) time per query for linear space data structures. Third, we examine the deterministic membership (or dictionary) problem in the bitprobe model. This problem asks us to store a set of n elements drawn from a universe [1,u] such that membership queries can be always answered in t bit probes. We present several new fully explicit results for this problem, in particular for the case when n = 2, answering an open problem posed by Radhakrishnan, Shah, and Shannigrahi [ESA 2010]. We also present a general strategy for the membership problem that can be used to solve many related fundamental problems, such as rank, counting, and emptiness queries. Finally, we conclude with a list of open problems and avenues for future work
    corecore