74 research outputs found
Dynamic Data Structures for Document Collections and Graphs
In the dynamic indexing problem, we must maintain a changing collection of
text documents so that we can efficiently support insertions, deletions, and
pattern matching queries. We are especially interested in developing efficient
data structures that store and query the documents in compressed form. All
previous compressed solutions to this problem rely on answering rank and select
queries on a dynamic sequence of symbols. Because of the lower bound in
[Fredman and Saks, 1989], answering rank queries presents a bottleneck in
compressed dynamic indexing. In this paper we show how this lower bound can be
circumvented using our new framework. We demonstrate that the gap between
static and dynamic variants of the indexing problem can be almost closed. Our
method is based on a novel framework for adding dynamism to static compressed
data structures. Our framework also applies more generally to dynamizing other
problems. We show, for example, how our framework can be applied to develop
compressed representations of dynamic graphs and binary relations
Online Sorting via Searching and Selection
In this paper, we present a framework based on a simple data structure and
parameterized algorithms for the problems of finding items in an unsorted list
of linearly ordered items based on their rank (selection) or value (search). As
a side-effect of answering these online selection and search queries, we
progressively sort the list. Our algorithms are based on Hoare's Quickselect,
and are parameterized based on the pivot selection method.
For example, if we choose the pivot as the last item in a subinterval, our
framework yields algorithms that will answer q<=n unique selection and/or
search queries in a total of O(n log q) average time. After q=\Omega(n) queries
the list is sorted. Each repeated selection query takes constant time, and each
repeated search query takes O(log n) time. The two query types can be
interleaved freely. By plugging different pivot selection methods into our
framework, these results can, for example, become randomized expected time or
deterministic worst-case time. Our methods are easy to implement, and we show
they perform well in practice
Succinct Representations of Dynamic Strings
The rank and select operations over a string of length n from an alphabet of
size have been used widely in the design of succinct data structures.
In many applications, the string itself need be maintained dynamically,
allowing characters of the string to be inserted and deleted. Under the word
RAM model with word size , we design a succinct representation
of dynamic strings using bits to support rank,
select, insert and delete in time. When the alphabet size is small, i.e. when \sigma = O(\polylog
(n)), including the case in which the string is a bit vector, these operations
are supported in time. Our data structures are more
efficient than previous results on the same problem, and we have applied them
to improve results on the design and construction of space-efficient text
indexes
Dynamic Rank/Select Dictionaries with Applications to XML Indexing
We consider a central problem in text indexing: Given a text T over an alphabet C, construct a conlpressed data structure answering the queries char(i), rank,(i); and select,(i) for a synlbol s E C. Wlany data structures consider these queries for static text T [GGVOS; FI\/IOl, SGOG, GMROG]. We consider the dynainic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T. This problenl is a key challenge in compressed text illdexing and has direct applicatioil to dynaillic XI\/IL iildexing structures that answer subpath queries [FLMM05]. We build on the results of [RRROZ, GMROG] and give the best known query bounds for the dynanlic version of this problem, supporting arbitrary insertions and deletions of sylllbols in T. Specifically, with an amortized update time of O((l/e)ne), we suggest how to support rank,(i), select,(i): and char(i) queries in O((~/E) loglogn) time, for ally e < 1. The best previous query tinles for this problem were O(logn1og ICI): given by [MNOG]. Our bounds are conlpetitive with state-of-the-art static structures [GhlROG]. Sonle applicable lower bounds for the partial sunls probleln [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is conlpetitive with the corresponding static structures. For the special case of bitvectors (i.e., 1x1 = 2); we also show the best tradeoffs for query/update time, inlproving upoil the results of [MNOG, HSSO3; RRR021. Finally, our focus on fast query/slower update is well-suited for a query-intensive XhlIL indexing ellvironment. Using the XBW transform [FLhllM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T
Compressed Data Structures for Dynamic Sequences
We consider the problem of storing a dynamic string over an alphabet
in compressed form. Our representation
supports insertions and deletions of symbols and answers three fundamental
queries: returns the -th symbol in ,
counts how many times a symbol occurs among the
first positions in , and finds the position
where a symbol occurs for the -th time. We present the first
fully-dynamic data structure for arbitrarily large alphabets that achieves
optimal query times for all three operations and supports updates with
worst-case time guarantees. Ours is also the first fully-dynamic data structure
that needs only bits, where is the -th order
entropy and is the string length. Moreover our representation supports
extraction of a substring in optimal time
Dynamic dictionary matching and compressed suffix trees
Recent breakthrough in compressed indexing data structures has reduced the space for indexing a text (or a collection of texts) of length n from O(n log n) bits to O(n) bits, while allowing very efficient pattern matching. Yet the compressed nature of such indices also makes them difficult to update dynamically. This paper presents the first O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T| log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. Our new suffix tree representation serves as a core part in a compact solution for the dynamic dictionary matching problem, i.e., providing an O(d)-bit data structure for a dynamic collection of patterns of total length d that can support the dictionary matching query efficiently. When compared with the O(d log d)-bit suffix tree based solution of Amir et al., the compact solution increases the query time by roughly a factor of log d only. In the study of the above results, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution.published_or_final_versio
Dynamic Dictionary with Subconstant Wasted Bits per Key
Dictionaries have been one of the central questions in data structures. A
dictionary data structure maintains a set of key-value pairs under insertions
and deletions such that given a query key, the data structure efficiently
returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton,
Kuszmaul, Kuszmaul, Liu 2022] store key-value pairs with only bits of redundancy, and support all operations in time,
for . It was recently shown to be optimal [Li, Liang, Yu, Zhou
2023b].
In this paper, we study the regime where the redundant bits is , and
show that when is at least , all operations can be
supported in time, matching the lower bound in this
regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on
which range is in. The data structure for utilizes a
generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein
2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for is based on recursively hashing into buckets with logarithmic
sizes.Comment: 46 pages; SODA 202
- …