Search CORE

74 research outputs found

Dynamic Data Structures for Document Collections and Graphs

Author: Munro J. Ian
Nekrich Yakov
Vitter Jeffrey Scott
Publication venue
Publication date: 19/03/2015
Field of study

In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Sorting via Searching and Selection

Author: Gupta Ankur
Kispert Anna
Sorenson Jonathan P.
Publication venue
Publication date: 01/01/2009
Field of study

In this paper, we present a framework based on a simple data structure and parameterized algorithms for the problems of finding items in an unsorted list of linearly ordered items based on their rank (selection) or value (search). As a side-effect of answering these online selection and search queries, we progressively sort the list. Our algorithms are based on Hoare's Quickselect, and are parameterized based on the pivot selection method. For example, if we choose the pivot as the last item in a subinterval, our framework yields algorithms that will answer q<=n unique selection and/or search queries in a total of O(n log q) average time. After q=\Omega(n) queries the list is sorted. Each repeated selection query takes constant time, and each repeated search query takes O(log n) time. The two query types can be interleaved freely. By plugging different pivot selection methods into our framework, these results can, for example, become randomized expected time or deterministic worst-case time. Our methods are easy to implement, and we show they perform well in practice

arXiv.org e-Print Archive

CiteSeerX

Digital Commons @ Butler University

Succinct Representations of Dynamic Strings

Author: He Meng
Munro J. Ian
Publication venue
Publication date: 01/01/2010
Field of study

The rank and select operations over a string of length n from an alphabet of size

\sigma

have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size

w=\Omega(\lg n)

, we design a succinct representation of dynamic strings using

nH_0 + o(n)\lg\sigma + O(w)

bits to support rank, select, insert and delete in

O(\frac{\lg n}{\lg\lg n}(\frac{\lg \sigma}{\lg\lg n}+1))

time. When the alphabet size is small, i.e. when \sigma = O(\polylog (n)), including the case in which the string is a bit vector, these operations are supported in

O(\frac{\lg n}{\lg\lg n})

time. Our data structures are more efficient than previous results on the same problem, and we have applied them to improve results on the design and construction of space-efficient text indexes

arXiv.org e-Print Archive

CiteSeerX

Dynamic Rank/Select Dictionaries with Applications to XML Indexing

Author: Gupta Ankur
Hon Wing-Kai
Shah Rahul
Vitter Jeffrey S.
Publication venue: 'Purdue University (bepress)'
Publication date: 11/07/2006
Field of study

We consider a central problem in text indexing: Given a text T over an alphabet C, construct a conlpressed data structure answering the queries char(i), rank,(i); and select,(i) for a synlbol s E C. Wlany data structures consider these queries for static text T [GGVOS; FI\/IOl, SGOG, GMROG]. We consider the dynainic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T. This problenl is a key challenge in compressed text illdexing and has direct applicatioil to dynaillic XI\/IL iildexing structures that answer subpath queries [FLMM05]. We build on the results of [RRROZ, GMROG] and give the best known query bounds for the dynanlic version of this problem, supporting arbitrary insertions and deletions of sylllbols in T. Specifically, with an amortized update time of O((l/e)ne), we suggest how to support rank,(i), select,(i): and char(i) queries in O((~/E) loglogn) time, for ally e < 1. The best previous query tinles for this problem were O(logn1og ICI): given by [MNOG]. Our bounds are conlpetitive with state-of-the-art static structures [GhlROG]. Sonle applicable lower bounds for the partial sunls probleln [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is conlpetitive with the corresponding static structures. For the special case of bitvectors (i.e., 1x1 = 2); we also show the best tradeoffs for query/update time, inlproving upoil the results of [MNOG, HSSO3; RRR021. Finally, our focus on fast query/slower update is well-suited for a query-intensive XhlIL indexing ellvironment. Using the XBW transform [FLhllM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T

CiteSeerX

Purdue E-Pubs

Compressed Data Structures for Dynamic Sequences

Author: A. Gupta
D. Belazzougui
G. Manzini
G. Navarro
H.-L. Chan
J. Barbay
J. Jansson
L. Arge
M. He
R. Grossi
S. Lee
S. Lee
V. Mäkinen
W.-K. Hon
W.-K. Hon
Publication venue
Publication date: 24/07/2015
Field of study

We consider the problem of storing a dynamic string

S

over an alphabet

\Sigma=\{\,1,\ldots,\sigma\,\}

in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries:

\mathrm{access}(i,S)

returns the

i

-th symbol in

S

\mathrm{rank}_a(i,S)

counts how many times a symbol

a

occurs among the first

i

positions in

S

, and

\mathrm{select}_a(i,S)

finds the position where a symbol

a

occurs for the

i

-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only

nH_k+o(n\log\sigma)

bits, where

H_k

is the

k

-th order entropy and

n

is the string length. Moreover our representation supports extraction of a substring

S[i..i+\ell]

in optimal

O(\log n/\log\log n + \ell/\log_{\sigma}n)

time

arXiv.org e-Print Archive

CiteSeerX

Crossref

Dynamic dictionary matching and compressed suffix trees

Author: Chan HL
Hon WK
Lam TW
Sadakane K
Publication venue: Society for Industrial and Applied Mathematics.
Publication date: 01/01/2005
Field of study

Recent breakthrough in compressed indexing data structures has reduced the space for indexing a text (or a collection of texts) of length n from O(n log n) bits to O(n) bits, while allowing very efficient pattern matching. Yet the compressed nature of such indices also makes them difficult to update dynamically. This paper presents the first O(n)-bit representation of a suffix tree for a dynamic collection of texts whose total length is n, which supports insertion and deletion of a text T in O(|T| log2 n) time, as well as all suffix tree traversal operations, including forward and backward suffix links. This work can be regarded as a generalization of the compressed representation of static texts. Our new suffix tree representation serves as a core part in a compact solution for the dynamic dictionary matching problem, i.e., providing an O(d)-bit data structure for a dynamic collection of patterns of total length d that can support the dictionary matching query efficiently. When compared with the O(d log d)-bit suffix tree based solution of Amir et al., the compact solution increases the query time by roughly a factor of log d only. In the study of the above results, we also derive the first O(n)-bit representation for maintaining n pairs of balanced parentheses in O(log n/log log n) time per operation, matching the time complexity of the previous O(n log n)-bit solution.published_or_final_versio

National Tsing Hua University Institutional Repository

HKU Scholars Hub

Dynamic Dictionary with Subconstant Wasted Bits per Key

Author: Li Tianxiao
Liang Jingxun
Yu Huacheng
Zhou Renfei
Publication venue
Publication date: 31/10/2023
Field of study

Dictionaries have been one of the central questions in data structures. A dictionary data structure maintains a set of key-value pairs under insertions and deletions such that given a query key, the data structure efficiently returns its value. The state-of-the-art dictionaries [Bender, Farach-Colton, Kuszmaul, Kuszmaul, Liu 2022] store

n

key-value pairs with only

O(n \log^{(k)} n)

bits of redundancy, and support all operations in

O(k)

time, for

k \leq \log^* n

. It was recently shown to be optimal [Li, Liang, Yu, Zhou 2023b]. In this paper, we study the regime where the redundant bits is

R=o(n)

, and show that when

R

is at least

n/\text{poly}\log n

, all operations can be supported in

O(\log^* n + \log (n/R))

time, matching the lower bound in this regime [Li, Liang, Yu, Zhou 2023b]. We present two data structures based on which range

R

is in. The data structure for

R<n/\log^{0.1} n

utilizes a generalization of adapters studied in [Berger, Kuszmaul, Polak, Tidor, Wein 2022] and [Li, Liang, Yu, Zhou 2023a]. The data structure for

R \geq n/\log^{0.1} n

is based on recursively hashing into buckets with logarithmic sizes.Comment: 46 pages; SODA 202

arXiv.org e-Print Archive