384 research outputs found

    Nearly Optimal Static Las Vegas Succinct Dictionary

    Full text link
    Given a set SS of nn (distinct) keys from key space [U][U], each associated with a value from Σ\Sigma, the \emph{static dictionary} problem asks to preprocess these (key, value) pairs into a data structure, supporting value-retrieval queries: for any given x[U]x\in [U], valRet(x)\mathtt{valRet}(x) must return the value associated with xx if xSx\in S, or return \bot if xSx\notin S. The special case where Σ=1|\Sigma|=1 is called the \emph{membership} problem. The "textbook" solution is to use a hash table, which occupies linear space and answers each query in constant time. On the other hand, the minimum possible space to encode all (key, value) pairs is only OPT:=lg2(Un)+nlg2Σ\mathtt{OPT}:= \lceil\lg_2\binom{U}{n}+n\lg_2|\Sigma|\rceil bits, which could be much less. In this paper, we design a randomized dictionary data structure using OPT+polylgn+O(lglglglglgU)\mathtt{OPT}+\mathrm{poly}\lg n+O(\lg\lg\lg\lg\lg U) bits of space, and it has \emph{expected constant} query time, assuming the query algorithm can access an external lookup table of size n0.001n^{0.001}. The lookup table depends only on UU, nn and Σ|\Sigma|, and not the input. Previously, even for membership queries and UnO(1)U\leq n^{O(1)}, the best known data structure with constant query time requires OPT+n/polylgn\mathtt{OPT}+n/\mathrm{poly}\lg n bits of space (Pagh [Pag01] and P\v{a}tra\c{s}cu [Pat08]); the best-known using OPT+n0.999\mathtt{OPT}+n^{0.999} space has query time O(lgn)O(\lg n); the only known non-trivial data structure with OPT+n0.001\mathtt{OPT}+n^{0.001} space has O(lgn)O(\lg n) query time and requires a lookup table of size n2.99\geq n^{2.99} (!). Our new data structure answers open questions by P\v{a}tra\c{s}cu and Thorup [Pat08,Tho13]. We also present a scheme that compresses a sequence XΣnX\in\Sigma^n to its zeroth order (empirical) entropy up to Σpolylgn|\Sigma|\cdot\mathrm{poly}\lg n extra bits, supporting decoding each XiX_i in O(lgΣ)O(\lg |\Sigma|) expected time.Comment: preliminary version appeared in STOC'2

    Compressed String Dictionary Search with Edit Distance One

    Get PDF
    In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern (Formula presented.) , the index has to report all the strings in the dictionary having edit distance at most one with (Formula presented.). Our first solution is able to solve queries in (almost optimal) (Formula presented.) time where (Formula presented.) is the number of strings in the dictionary having edit distance at most one with (Formula presented.). The space complexity of this solution is bounded in terms of the (Formula presented.) th order entropy of the indexed dictionary. A second solution further improves this space complexity at the cost of increasing the query time. Finally, we propose randomized solutions (Monte Carlo and Las Vegas) which achieve simultaneously the time complexity of the first solution and the space complexity of the second one

    A Dynamic Space-Efficient Filter with Constant Time Operations

    Get PDF
    A dynamic dictionary is a data structure that maintains sets of cardinality at most n from a given universe and supports insertions, deletions, and membership queries. A filter approximates membership queries with a one-sided error that occurs with probability at most ?. The goal is to obtain dynamic filters that are space-efficient (the space is 1+o(1) times the information-theoretic lower bound) and support all operations in constant time with high probability. One approach to designing filters is to reduce to the retrieval problem. When the size of the universe is polynomial in n, this approach yields a space-efficient dynamic filter as long as the error parameter ? satisfies log(1/?) = ?(log log n). For the case that log(1/?) = O(log log n), we present the first space-efficient dynamic filter with constant time operations in the worst case (whp). In contrast, the space-efficient dynamic filter of Pagh et al. [Anna Pagh et al., 2005] supports insertions and deletions in amortized expected constant time. Our approach employs the classic reduction of Carter et al. [Carter et al., 1978] on a new type of dictionary construction that supports random multisets

    Dynamic "Succincter"

    Full text link
    Augmented B-trees (aB-trees) are a broad class of data structures. The seminal work "succincter" by Patrascu showed that any aB-tree can be stored using only two bits of redundancy, while supporting queries to the tree in time proportional to its depth. It has been a versatile building block for constructing succinct data structures, including rank/select data structures, dictionaries, locally decodable arithmetic coding, storing balanced parenthesis, etc. In this paper, we show how to "dynamize" an aB-tree. Our main result is the design of dynamic aB-trees (daB-trees) with branching factor two using only three bits of redundancy (with the help of lookup tables that are of negligible size in applications), while supporting updates and queries in time polynomial in its depth. As an application, we present a dynamic rank/select data structure for nn-bit arrays, also known as a dynamic fully indexable dictionary (FID). It supports updates and queries in O(logn/loglogn)O(\log n/\log\log n) time, and when the array has mm ones, the data structure occupies log(nm)+O(n/2log0.199n) \log\binom{n}{m} + O(n/2^{\log^{0.199}n}) bits. Note that the update and query times are optimal even without space constraints due to a lower bound by Fredman and Saks. Prior to our work, no dynamic FID with near-optimal update and query times and redundancy o(n/logn)o(n/\log n) was known. We further show that a dynamic sequence supporting insertions, deletions and rank/select queries can be maintained in (optimal) O(logn/loglogn)O(\log n/\log\log n) time and with O(npolyloglogn/log2n)O(n \cdot \text{poly}\log\log n/\log^2 n) bits of redundancy.Comment: 33 pages, 1 figure; in FOCS 202

    Tight Cell-Probe Lower Bounds for Dynamic Succinct Dictionaries

    Full text link
    A dictionary data structure maintains a set of at most nn keys from the universe [U][U] under key insertions and deletions, such that given a query x[U]x \in [U], it returns if xx is in the set. Some variants also store values associated to the keys such that given a query xx, the value associated to xx is returned when xx is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies O(nlogU)O(n\log U) bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of log(Un)+O(nlog(k)n) \log\binom{U}{n}+O(n\log^{(k)} n) bits, while supporting all operations in O(k)O(k) time, for any parameter klognk \leq \log^* n. The term O(log(k)n)=O(loglogkn)O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n) is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For U=n1+Θ(1)U=n^{1+\Theta(1)}, any dictionary with O(log(k)n)O(\log^{(k)} n) wasted bits per key must have expected operational time Ω(k)\Omega(k), in the cell-probe model with word-size w=Θ(logU)w=\Theta(\log U). Furthermore, if a dictionary stores values of Θ(logU)\Theta(\log U) bits, we show that regardless of the query time, it must have Ω(k)\Omega(k) expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.Comment: 35 page

    Block trees

    Get PDF
    Let string S[1..n] be parsed into z phrases by the Lempel-Ziv algorithm. The corresponding compression algorithm encodes S in O(z) space, but it does not support random access to S. We introduce a data structure, the block tree, that represents S in O(z log(n/z)) space and extracts any symbol of S in time O(log(n/z)), among other space-time tradeoffs. The structure also supports other queries that are useful for building compressed data structures on top of S. Further, block trees can be built in linear time and in a scalable manner. Our experiments show that block trees offer relevant space-time tradeoffs compared to other compressed string representations for highly repetitive strings. (C) 2020 Elsevier Inc. All rights reserved.Peer reviewe

    Dynamic Dictionary with Subconstant Wasted Bits per Key

    Full text link
    Dictionaries have been one of the central questions in data structures. A dictionary data structure maintains a set of key-value pairs under insertions and deletions such that given a query key, the data structure efficiently returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton, Kuszmaul, Kuszmaul, Liu 2022] store nn key-value pairs with only O(nlog(k)n)O(n \log^{(k)} n) bits of redundancy, and support all operations in O(k)O(k) time, for klognk \leq \log^* n. It was recently shown to be optimal [Li, Liang, Yu, Zhou 2023b]. In this paper, we study the regime where the redundant bits is R=o(n)R=o(n), and show that when RR is at least n/polylognn/\text{poly}\log n, all operations can be supported in O(logn+log(n/R))O(\log^* n + \log (n/R)) time, matching the lower bound in this regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on which range RR is in. The data structure for R<n/log0.1nR<n/\log^{0.1} n utilizes a generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein 2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for Rn/log0.1nR \geq n/\log^{0.1} n is based on recursively hashing into buckets with logarithmic sizes.Comment: 46 pages; SODA 202

    Range Avoidance for Low-Depth Circuits and Connections to Pseudorandomness

    Get PDF
    In the range avoidance problem, the input is a multi-output Boolean circuit with more outputs than inputs, and the goal is to find a string outside its range (which is guaranteed to exist). We show that well-known explicit construction questions such as finding binary linear codes achieving the Gilbert-Varshamov bound or list-decoding capacity, and constructing rigid matrices, reduce to the range avoidance problem of log-depth circuits, and by a further recent reduction [Ren, Santhanam, and Wang, FOCS 2022] to NC?? circuits where each output depends on at most 4 input bits. On the algorithmic side, we show that range avoidance for NC?? circuits can be solved in polynomial time. We identify a general condition relating to correlation with low-degree parities that implies that any almost pairwise independent set has some string that avoids the range of every circuit in the class. We apply this to NC? circuits, and to small width CNF/DNF and general De Morgan formulae (via a connection to approximate-degree), yielding non-trivial small hitting sets for range avoidance in these cases

    A Survey of Satisfiability Modulo Theory

    Full text link
    Satisfiability modulo theory (SMT) consists in testing the satisfiability of first-order formulas over linear integer or real arithmetic, or other theories. In this survey, we explain the combination of propositional satisfiability and decision procedures for conjunctions known as DPLL(T), and the alternative "natural domain" approaches. We also cover quantifiers, Craig interpolants, polynomial arithmetic, and how SMT solvers are used in automated software analysis.Comment: Computer Algebra in Scientific Computing, Sep 2016, Bucharest, Romania. 201

    Search Engine Optimization: Best Practices for Google

    Full text link
    The internet is a major delivery system of hotel reservations. Approximately 25% of all reservations made at a hotel come directly through the hotel’s website (Douglas, 2012). Another 11% of total reservations are booked online through online travel agent websites, or OTAs, such as Priceline.com or Expedia.com (Douglas, 2012). These additional reservations booked through the OTAs come at a cost to the hotel, however. Typical commissions for OTAs are approximately 25 % of a total booking (Sanders, 2012). In 2010, it was estimated that the commissions associated with these OTA bookings cost hoteliers 2.5 billion dollars (Douglas, 2012). Because of the increased cost associated with reservations that come through the OTAs, and the increased competition of competitors websites, hoteliers must takes steps to ensure that their property can easily be found within search engines