Search CORE

240 research outputs found

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Cording Patrick Hagge
Gørtz Inge Li
Skjoldjensen Frederik Rye
Vildhøj Hjalte Wedel
Vind Søren
Publication venue
Publication date: 01/01/2016
Field of study

Given a static reference string

R

and a source string

S

, a relative compression of

S

with respect to

R

is an encoding of

S

as a sequence of references to substrings of

R

. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string

S

is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

arXiv.org e-Print Archive

Online Research Database In Technology

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Cording Patrick Hagge
Skjoldjensen Frederik Rye
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Symposium on Algorithms and Computation (ISAAC 2016)
Publication date: 01/01/2016
Field of study

Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

Dagstuhl Research Online Publication Server

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Christiansen Anders Roy
Cording Patrick Hagge
Gørtz Inge Li
Skjoldjensen Frederik Rye
Vildhøj Hjalte Wedel
Vind Søren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

Matching and Compression of Strings with Automata and Word Packing

Author: Skjoldjensen Frederik Rye
Publication venue: DTU Compute
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

Compressed and Practical Data Structures for Strings

Author: Christiansen Anders Roy
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

Author: A Bowe
D Belazzougui
G Navarro
G Navarro
J Sirén
J Ziv
JI Munro
MA Bender
V Mäkinen
W Hon
Publication venue
Publication date: 14/10/2017
Field of study

Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in

O(n\lg r)

time and

O(r\lg n)

bits of space, where

n

is the length of input string

S

received so far and

r

is the number of runs in the BWT of the reversed

S

. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201

arXiv.org e-Print Archive

Crossref

Random Access in Persistent Strings and Segment Selection

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 11/02/2021
Field of study

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with

n

nodes, we show how to represent the corresponding collection in

O(n)

space and

O(\log n/ \log \log n)

query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower bound proving that the query time is optimal for any solution using near-linear space. To achieve our bounds for random access in persistent strings we show how to reduce the problem to the following natural geometric selection problem on line segments. Consider a set of horizontal line segments in the plane. Given parameters

i

and

j

, a segment selection query returns the

j

th smallest segment (the segment with the

j

th smallest

y

-coordinate) among the segments crossing the vertical line through

x

-coordinate

i

. The segment selection problem is to preprocess a set of horizontal line segments into a compact data structure that supports fast segment selection queries. We present a solution that uses

O(n)

space and support segment selection queries in

O(\log n/ \log \log n)

time, where

n

is the number of segments. Furthermore, we prove that that this query time is also optimal for any solution using near-linear space.Comment: Extended abstract at ISAAC 202

arXiv.org e-Print Archive

Online Research Database In Technology

Algorithms and Data Structures for Strings, Points and Integers:or, Points about Strings and Strings about Points

Author: Vind Søren Juhl
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Storage and Retrieval of Highly Repetitive Sequence Collections

Author: Mäkinen Veli
Navarro Gonzalo
Sirén Jouni
Välimäki Niko
Publication venue
Publication date: 01/01/2009
Field of study

Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

Update Query Time Trade-Off for Dynamic Suffix Arrays

Author: Amir Amihood
Boneh Itai
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Algorithms and Computation (ISAAC 2020)
Publication date: 01/01/2020
Field of study

The Suffix Array SA(S) of a string S[1 ... n] is an array containing all the suffixes of S sorted by lexicographic order. The suffix array is one of the most well known indexing data structures, and it functions as a key tool in many string algorithms. In this paper, we present a data structure for maintaining the Suffix Array of a dynamic string. For every

0 \leq \varepsilon \leq 1

, our data structure reports SA[i] in

\tilde{O}(n^{\varepsilon})

time and handles text modification in

\tilde{O}(n^{1-\varepsilon})

time. Additionally, our data structure enables the same query time for reporting iSA[i], with iSA being the Inverse Suffix Array of S[1 ... n]. Our data structure can be used to construct sub-linear dynamic variants of static strings algorithms or data structures that are based on the Suffix Array and the Inverse Suffix Array.Comment: 19 pages, 3 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server