    Dynamic Data Structures for Document Collections and Graphs

    In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

    Online Sorting via Searching and Selection

    In this paper, we present a framework based on a simple data structure and parameterized algorithms for the problems of finding items in an unsorted list of linearly ordered items based on their rank (selection) or value (search). As a side-effect of answering these online selection and search queries, we progressively sort the list. Our algorithms are based on Hoare's Quickselect, and are parameterized based on the pivot selection method. For example, if we choose the pivot as the last item in a subinterval, our framework yields algorithms that will answer q<=n unique selection and/or search queries in a total of O(n log q) average time. After q=\Omega(n) queries the list is sorted. Each repeated selection query takes constant time, and each repeated search query takes O(log n) time. The two query types can be interleaved freely. By plugging different pivot selection methods into our framework, these results can, for example, become randomized expected time or deterministic worst-case time. Our methods are easy to implement, and we show they perform well in practice

    Succinct Representations of Dynamic Strings

    The rank and select operations over a string of length n from an alphabet of size σ\sigma have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size w=Ω(lgn)w=\Omega(\lg n), we design a succinct representation of dynamic strings using nH0+o(n)lgσ+O(w)nH_0 + o(n)\lg\sigma + O(w) bits to support rank, select, insert and delete in O(lgnlglgn(lgσlglgn+1))O(\frac{\lg n}{\lg\lg n}(\frac{\lg \sigma}{\lg\lg n}+1)) time. When the alphabet size is small, i.e. when \sigma = O(\polylog (n)), including the case in which the string is a bit vector, these operations are supported in O(lgnlglgn)O(\frac{\lg n}{\lg\lg n}) time. Our data structures are more efficient than previous results on the same problem, and we have applied them to improve results on the design and construction of space-efficient text indexes

    Dynamic Rank/Select Dictionaries with Applications to XML Indexing

    We consider a central problem in text indexing: Given a text T over an alphabet C, construct a conlpressed data structure answering the queries char(i), rank,(i); and select,(i) for a synlbol s E C. Wlany data structures consider these queries for static text T [GGVOS; FI\/IOl, SGOG, GMROG]. We consider the dynainic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T. This problenl is a key challenge in compressed text illdexing and has direct applicatioil to dynaillic XI\/IL iildexing structures that answer subpath queries [FLMM05]. We build on the results of [RRROZ, GMROG] and give the best known query bounds for the dynanlic version of this problem, supporting arbitrary insertions and deletions of sylllbols in T. Specifically, with an amortized update time of O((l/e)ne), we suggest how to support rank,(i), select,(i): and char(i) queries in O((~/E) loglogn) time, for ally e &lt; 1. The best previous query tinles for this problem were O(logn1og ICI): given by [MNOG]. Our bounds are conlpetitive with state-of-the-art static structures [GhlROG]. Sonle applicable lower bounds for the partial sunls probleln [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is conlpetitive with the corresponding static structures. For the special case of bitvectors (i.e., 1x1 = 2); we also show the best tradeoffs for query/update time, inlproving upoil the results of [MNOG, HSSO3; RRR021. Finally, our focus on fast query/slower update is well-suited for a query-intensive XhlIL indexing ellvironment. Using the XBW transform [FLhllM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T

    Compressed Data Structures for Dynamic Sequences

    We consider the problem of storing a dynamic string SS over an alphabet Σ={1,,σ}\Sigma=\{\,1,\ldots,\sigma\,\} in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: access(i,S)\mathrm{access}(i,S) returns the ii-th symbol in SS, ranka(i,S)\mathrm{rank}_a(i,S) counts how many times a symbol aa occurs among the first ii positions in SS, and selecta(i,S)\mathrm{select}_a(i,S) finds the position where a symbol aa occurs for the ii-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only nHk+o(nlogσ)nH_k+o(n\log\sigma) bits, where HkH_k is the kk-th order entropy and nn is the string length. Moreover our representation supports extraction of a substring S[i..i+]S[i..i+\ell] in optimal O(logn/loglogn+/logσn)O(\log n/\log\log n + \ell/\log_{\sigma}n) time

    Dynamic dictionary matching and compressed suffix trees

    Recent breakthrough in compressed indexing data structures has reduced the space for indexing a text (or a collection of texts) of length n from O(n log n) bits to O(n) bits, while allowing very efficient pattern matching. Yet the compressed nature of such indices also makes them difficult to update dynamically. This paper presents the first O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T| log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. Our new suffix tree representation serves as a core part in a compact solution for the dynamic dictionary matching problem, i.e., providing an O(d)-bit data structure for a dynamic collection of patterns of total length d that can support the dictionary matching query efficiently. When compared with the O(d log d)-bit suffix tree based solution of Amir et al., the compact solution increases the query time by roughly a factor of log d only. In the study of the above results, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution.published_or_final_versio

    Dynamic Dictionary with Subconstant Wasted Bits per Key

    Dictionaries have been one of the central questions in data structures. A dictionary data structure maintains a set of key-value pairs under insertions and deletions such that given a query key, the data structure efficiently returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton, Kuszmaul, Kuszmaul, Liu 2022] store nn key-value pairs with only O(nlog(k)n)O(n \log^{(k)} n) bits of redundancy, and support all operations in O(k)O(k) time, for klognk \leq \log^* n. It was recently shown to be optimal [Li, Liang, Yu, Zhou 2023b]. In this paper, we study the regime where the redundant bits is R=o(n)R=o(n), and show that when RR is at least n/polylognn/\text{poly}\log n, all operations can be supported in O(logn+log(n/R))O(\log^* n + \log (n/R)) time, matching the lower bound in this regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on which range RR is in. The data structure for R<n/log0.1nR<n/\log^{0.1} n utilizes a generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein 2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for Rn/log0.1nR \geq n/\log^{0.1} n is based on recursively hashing into buckets with logarithmic sizes.Comment: 46 pages; SODA 202