Search CORE

7 research outputs found

Space-Efficient Re-Pair Compression

Author: Bille Philip
Gørtz Inge Li
Prezza Nicola
Publication venue
Publication date: 04/11/2016
Field of study

Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let

n

\sigma

, and

d

be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and

5n + 4\sigma^2 + 4d + \sqrt{n}

words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of

\lceil\log_2 n\rceil

bits and a re-writable input text composed by

n

such words. Our first algorithm runs in expected

\mathcal O(n/\epsilon)

time and uses

(1+\epsilon)n +\sqrt n

words of space on top of the text for any parameter

0<\epsilon \leq 1

chosen in advance. Our second algorithm runs in expected

\mathcal O(n\log n)

time and improves the space to

n +\sqrt n

words

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

Author: Prezza Nicola
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of encoding a string of length

n

from an integer alphabet of size

\sigma

so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any uniquely-decodable encoding supporting access must take

n\log\sigma + \Theta(\log (n\log\sigma))

bits. We describe a new data structure matching this lower bound when

\sigma\leq n^{O(1)}

while supporting both queries in optimal

O(1)

time. Furthermore, we show that the string can be overwritten in-place with this structure. The redundancy of

\Theta(\log n)

bits and the constant query time break exponentially a lower bound that is known to hold in the read-only model. Using our new string representation, we obtain the first in-place subquadratic (indeed, even sublinear in some cases) algorithms for several string-processing problems in the restore model: the input string is rewritable and must be restored before the computation terminates. In particular, we describe the first in-place subquadratic Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. With the sole exception of suffix selection, our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first sublinear-time Monte Carlo algorithm for building the sparse suffix tree in compact space. We also show how to derandomize our algorithms using small space. This leads to the first Las Vegas in-place algorithm computing the full LCP array in

O(n\log n)

time and to the first Las Vegas in-place algorithms solving the sparse suffix sorting and sparse LCP array construction problems in

O(n^{1.5}\sqrt{\log \sigma})

time. Running times of these Las Vegas algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las Vegas algorithm

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Radix Sorting With No Extra Space

Author: Franceschini Gianni
Muthukrishnan S.
Patrascu Mihai
Publication venue
Publication date: 01/01/2007
Field of study

It is well known that n integers in the range [1,n^c] can be sorted in O(n) time in the RAM model using radix sorting. More generally, integers in any range [1,U] can be sorted in O(n sqrt{loglog n}) time. However, these algorithms use O(n) words of extra memory. Is this necessary? We present a simple, stable, integer sorting algorithm for words of size O(log n), which works in O(n) time and uses only O(1) words of extra memory on a RAM model. This is the integer sorting case most useful in practice. We extend this result with same bounds to the case when the keys are read-only, which is of theoretical interest. Another interesting question is the case of arbitrary c. Here we present a black-box transformation from any RAM sorting algorithm to a sorting algorithm which uses only O(1) extra space and has the same running time. This settles the complexity of in-place sorting in terms of the complexity of sorting.Comment: Full version of paper accepted to ESA 2007. (17 pages

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Sparse Suffix and LCP Array:Simple, Direct, Small, and Fast

Author: Ayad Lorraine A.K.
Loukidis Grigorios
Pissis Solon P.
Verbeek Hilde
Publication venue
Publication date: 15/12/2023
Field of study

Sparse suffix sorting is the problem of sorting b = o(n) suffixes of a string of length n. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in O(n log b) time, in the worst case, or in O(n) time, when the total number of suffixes with an LCP value greater than 2⌊log n/b⌋+1− 1 is in O(b/ log b), matching the time of optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only 8b + o(b) machine words. We also show that our second algorithm can be trivially amended to work in O(n) time for any uniformly random string. Our algorithms are non-trivial space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in O(n log b) time [STACS 2014]

King's Research Portal

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Author: Ayad Lorraine A. K.
Loukides Grigorios
Pissis Solon P.
Verbeek Hilde
Publication venue
Publication date: 13/10/2023
Field of study

Sparse suffix sorting is the problem of sorting

b=o(n)

suffixes of a string of length

n

. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in

\mathcal{O}(n\log b)

time, in the worst case, or in

\mathcal{O}(n)

time, when the total number of suffixes with an LCP value greater than

2^{\lfloor \log \frac{n}{b} \rfloor + 1}-1

is in

\mathcal{O}(b/\log b)

, matching the time of the optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only

8b+o(b)

machine words. Our algorithms are simplified, yet non-trivial, space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in

\mathcal{O}(n\log b)

time [STACS 2014]. We also provide proof-of-concept experiments to justify our claims on simplicity and efficiency.Comment: 16 pages, 1 figur

arXiv.org e-Print Archive

Partial Sums on the Ultra-Wide Word RAM

Author: A Brodnik
A Brodnik
A Farzan
AC Yao
AM Ben-Amram
AM Ben-Amram
BY Ryabko
E Lindholm
G Franceschini
GS Frandsen
H Hampapuram
J Reinders
J Salowe
JI Munro
JWJ Williams
M Pǎtraşcu
ML Fredman
ML Fredman
N Stephens
P Bille
P Bille
PF Dietz
PM Fenwick
R Raman
RE Ladner
T Chen
T Hagerup
T Husfeldt
T Husfeldt
WA Burkhard
WK Hon
Publication venue
Publication date: 01/01/2020
Field of study

We consider the classic partial sums problem on the ultra-wide word RAM model of computation. This model extends the classic

w

-bit word RAM model with special ultrawords of length

w^2

bits that support standard arithmetic and boolean operation and scattered memory access operations that can access

w

(non-contiguous) locations in memory. The ultra-wide word RAM model captures (and idealizes) modern vector processor architectures. Our main result is a new in-place data structure for the partial sum problem that only stores a constant number of ultraword in addition to the input and supports operations in doubly logarithmic time. This matches the best known time bounds for the problem (among polynomial space data structures) while improving the space from superlinear to a constant number of ultrawords. Our results are based on a simple and elegant in-place word RAM data structure, known as the Fenwick tree. Our main technical contribution is a new efficient parallel ultra-wide word RAM implementation of the Fenwick tree, which is likely of independent interest.Comment: Extended abstract appeared at TAMC 202

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Solving Geometric Problems in Space-Conscious Models

Author: Chen Yu
Publication venue: 'University of Waterloo'
Publication date: 01/01/2009
Field of study

When dealing with massive data sets, standard algorithms may easily ``run out of memory''. In this thesis, we design efficient algorithms in space-conscious models. In particular, in-place algorithms, multi-pass algorithms, read-only algorithms, and stream-sort algorithms are studied, and the focus is on fundamental geometric problems, such as 2D convex hulls, 3D convex hulls, Voronoi diagrams and nearest neighbor queries, Klee's measure problem, and low-dimensional linear programming. In-place algorithms only use O(1) extra space besides the input array. We present a data structure for 2D nearest neighbor queries and algorithms for Klee's measure problem in this model. Algorithms in the multi-pass model only make read-only sequential access to the input, and use sublinear working space and small (usually a constant) number of passes on the input. We present algorithms and lower bounds for many problems, including low-dimensional linear programming and convex hulls, in this model. Algorithms in the read-only model only make read-only random access to the input array, and use sublinear working space. We present algorithms for Klee's measure problem and 2D convex hulls in this model. Algorithms in the stream-sort model use sorting as a primitive operation. Each pass can either sort the data or make sequential access to the data. As in the multi-pass model, these algorithms can only use sublinear working space and a small (usually a constant) number of passes on the data. We present algorithms for constructing convex hulls and polygon triangulation in this model

University of Waterloo's Institutional Repository