6 research outputs found

    Nearly Optimal Static Las Vegas Succinct Dictionary

    Full text link
    Given a set SS of nn (distinct) keys from key space [U][U], each associated with a value from Σ\Sigma, the \emph{static dictionary} problem asks to preprocess these (key, value) pairs into a data structure, supporting value-retrieval queries: for any given x[U]x\in [U], valRet(x)\mathtt{valRet}(x) must return the value associated with xx if xSx\in S, or return \bot if xSx\notin S. The special case where Σ=1|\Sigma|=1 is called the \emph{membership} problem. The "textbook" solution is to use a hash table, which occupies linear space and answers each query in constant time. On the other hand, the minimum possible space to encode all (key, value) pairs is only OPT:=lg2(Un)+nlg2Σ\mathtt{OPT}:= \lceil\lg_2\binom{U}{n}+n\lg_2|\Sigma|\rceil bits, which could be much less. In this paper, we design a randomized dictionary data structure using OPT+polylgn+O(lglglglglgU)\mathtt{OPT}+\mathrm{poly}\lg n+O(\lg\lg\lg\lg\lg U) bits of space, and it has \emph{expected constant} query time, assuming the query algorithm can access an external lookup table of size n0.001n^{0.001}. The lookup table depends only on UU, nn and Σ|\Sigma|, and not the input. Previously, even for membership queries and UnO(1)U\leq n^{O(1)}, the best known data structure with constant query time requires OPT+n/polylgn\mathtt{OPT}+n/\mathrm{poly}\lg n bits of space (Pagh [Pag01] and P\v{a}tra\c{s}cu [Pat08]); the best-known using OPT+n0.999\mathtt{OPT}+n^{0.999} space has query time O(lgn)O(\lg n); the only known non-trivial data structure with OPT+n0.001\mathtt{OPT}+n^{0.001} space has O(lgn)O(\lg n) query time and requires a lookup table of size n2.99\geq n^{2.99} (!). Our new data structure answers open questions by P\v{a}tra\c{s}cu and Thorup [Pat08,Tho13]. We also present a scheme that compresses a sequence XΣnX\in\Sigma^n to its zeroth order (empirical) entropy up to Σpolylgn|\Sigma|\cdot\mathrm{poly}\lg n extra bits, supporting decoding each XiX_i in O(lgΣ)O(\lg |\Sigma|) expected time.Comment: preliminary version appeared in STOC'2

    A Dynamic Space-Efficient Filter with Constant Time Operations

    Get PDF
    A dynamic dictionary is a data structure that maintains sets of cardinality at most n from a given universe and supports insertions, deletions, and membership queries. A filter approximates membership queries with a one-sided error that occurs with probability at most ?. The goal is to obtain dynamic filters that are space-efficient (the space is 1+o(1) times the information-theoretic lower bound) and support all operations in constant time with high probability. One approach to designing filters is to reduce to the retrieval problem. When the size of the universe is polynomial in n, this approach yields a space-efficient dynamic filter as long as the error parameter ? satisfies log(1/?) = ?(log log n). For the case that log(1/?) = O(log log n), we present the first space-efficient dynamic filter with constant time operations in the worst case (whp). In contrast, the space-efficient dynamic filter of Pagh et al. [Anna Pagh et al., 2005] supports insertions and deletions in amortized expected constant time. Our approach employs the classic reduction of Carter et al. [Carter et al., 1978] on a new type of dictionary construction that supports random multisets

    Range Avoidance for Low-Depth Circuits and Connections to Pseudorandomness

    Get PDF
    In the range avoidance problem, the input is a multi-output Boolean circuit with more outputs than inputs, and the goal is to find a string outside its range (which is guaranteed to exist). We show that well-known explicit construction questions such as finding binary linear codes achieving the Gilbert-Varshamov bound or list-decoding capacity, and constructing rigid matrices, reduce to the range avoidance problem of log-depth circuits, and by a further recent reduction [Ren, Santhanam, and Wang, FOCS 2022] to NC?? circuits where each output depends on at most 4 input bits. On the algorithmic side, we show that range avoidance for NC?? circuits can be solved in polynomial time. We identify a general condition relating to correlation with low-degree parities that implies that any almost pairwise independent set has some string that avoids the range of every circuit in the class. We apply this to NC? circuits, and to small width CNF/DNF and general De Morgan formulae (via a connection to approximate-degree), yielding non-trivial small hitting sets for range avoidance in these cases

    Dynamic "Succincter"

    Full text link
    Augmented B-trees (aB-trees) are a broad class of data structures. The seminal work "succincter" by Patrascu showed that any aB-tree can be stored using only two bits of redundancy, while supporting queries to the tree in time proportional to its depth. It has been a versatile building block for constructing succinct data structures, including rank/select data structures, dictionaries, locally decodable arithmetic coding, storing balanced parenthesis, etc. In this paper, we show how to "dynamize" an aB-tree. Our main result is the design of dynamic aB-trees (daB-trees) with branching factor two using only three bits of redundancy (with the help of lookup tables that are of negligible size in applications), while supporting updates and queries in time polynomial in its depth. As an application, we present a dynamic rank/select data structure for nn-bit arrays, also known as a dynamic fully indexable dictionary (FID). It supports updates and queries in O(logn/loglogn)O(\log n/\log\log n) time, and when the array has mm ones, the data structure occupies log(nm)+O(n/2log0.199n) \log\binom{n}{m} + O(n/2^{\log^{0.199}n}) bits. Note that the update and query times are optimal even without space constraints due to a lower bound by Fredman and Saks. Prior to our work, no dynamic FID with near-optimal update and query times and redundancy o(n/logn)o(n/\log n) was known. We further show that a dynamic sequence supporting insertions, deletions and rank/select queries can be maintained in (optimal) O(logn/loglogn)O(\log n/\log\log n) time and with O(npolyloglogn/log2n)O(n \cdot \text{poly}\log\log n/\log^2 n) bits of redundancy.Comment: 33 pages, 1 figure; in FOCS 202

    Tight Cell-Probe Lower Bounds for Dynamic Succinct Dictionaries

    Full text link
    A dictionary data structure maintains a set of at most nn keys from the universe [U][U] under key insertions and deletions, such that given a query x[U]x \in [U], it returns if xx is in the set. Some variants also store values associated to the keys such that given a query xx, the value associated to xx is returned when xx is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies O(nlogU)O(n\log U) bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of log(Un)+O(nlog(k)n) \log\binom{U}{n}+O(n\log^{(k)} n) bits, while supporting all operations in O(k)O(k) time, for any parameter klognk \leq \log^* n. The term O(log(k)n)=O(loglogkn)O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n) is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For U=n1+Θ(1)U=n^{1+\Theta(1)}, any dictionary with O(log(k)n)O(\log^{(k)} n) wasted bits per key must have expected operational time Ω(k)\Omega(k), in the cell-probe model with word-size w=Θ(logU)w=\Theta(\log U). Furthermore, if a dictionary stores values of Θ(logU)\Theta(\log U) bits, we show that regardless of the query time, it must have Ω(k)\Omega(k) expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.Comment: 35 page

    Dynamic Dictionary with Subconstant Wasted Bits per Key

    Full text link
    Dictionaries have been one of the central questions in data structures. A dictionary data structure maintains a set of key-value pairs under insertions and deletions such that given a query key, the data structure efficiently returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton, Kuszmaul, Kuszmaul, Liu 2022] store nn key-value pairs with only O(nlog(k)n)O(n \log^{(k)} n) bits of redundancy, and support all operations in O(k)O(k) time, for klognk \leq \log^* n. It was recently shown to be optimal [Li, Liang, Yu, Zhou 2023b]. In this paper, we study the regime where the redundant bits is R=o(n)R=o(n), and show that when RR is at least n/polylognn/\text{poly}\log n, all operations can be supported in O(logn+log(n/R))O(\log^* n + \log (n/R)) time, matching the lower bound in this regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on which range RR is in. The data structure for R<n/log0.1nR<n/\log^{0.1} n utilizes a generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein 2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for Rn/log0.1nR \geq n/\log^{0.1} n is based on recursively hashing into buckets with logarithmic sizes.Comment: 46 pages; SODA 202
    corecore