150 research outputs found
Queries on LZ-Bounded Encodings
We describe a data structure that stores a string in space similar to
that of its Lempel-Ziv encoding and efficiently supports access, rank and
select queries. These queries are fundamental for implementing succinct and
compressed data structures, such as compressed trees and graphs. We show that
our data structure can be built in a scalable manner and is both small and fast
in practice compared to other data structures supporting such queries
Computing LZ77 in Run-Compressed Space
In this paper, we show that the LZ77 factorization of a text T {\in\Sigma^n}
can be computed in O(R log n) bits of working space and O(n log R) time, R
being the number of runs in the Burrows-Wheeler transform of T reversed. For
extremely repetitive inputs, the working space can be as low as O(log n) bits:
exponentially smaller than the text itself. As a direct consequence of our
result, we show that a class of repetition-aware self-indexes based on a
combination of run-length encoded BWT and LZ77 can be built in asymptotically
optimal O(R + z) words of working space, z being the size of the LZ77 parsing
Hard Instances of the Constrained Discrete Logarithm Problem
The discrete logarithm problem (DLP) generalizes to the constrained DLP,
where the secret exponent belongs to a set known to the attacker. The
complexity of generic algorithms for solving the constrained DLP depends on the
choice of the set. Motivated by cryptographic applications, we study sets with
succinct representation for which the constrained DLP is hard. We draw on
earlier results due to Erd\"os et al. and Schnorr, develop geometric tools such
as generalized Menelaus' theorem for proving lower bounds on the complexity of
the constrained DLP, and construct sets with succinct representation with
provable non-trivial lower bounds
Small space and streaming pattern matching with k edits
In this work, we revisit the fundamental and well-studied problem of
approximate pattern matching under edit distance. Given an integer , a
pattern of length , and a text of length , the task is to
find substrings of that are within edit distance from . Our main
result is a streaming algorithm that solves the problem in
space and amortised time per character of the text, providing
answers correct with high probability. (Hereafter, hides a
factor.) This answers a decade-old question: since the
discovery of a -space streaming algorithm for pattern
matching under Hamming distance by Porat and Porat [FOCS 2009], the existence
of an analogous result for edit distance remained open. Up to this work, no
-space algorithm was known even in the simpler
semi-streaming model, where comes as a stream but is available for
read-only access. In this model, we give a deterministic algorithm that
achieves slightly better complexity.
In order to develop the fully streaming algorithm, we introduce a new edit
distance sketch parametrised by integers . For any string of length at
most , the sketch is of size and it can be computed with an
-space streaming algorithm. Given the sketches of two strings,
in time we can compute their edit distance or certify that it
is larger than . This result improves upon -size sketches of
Belazzougui and Zhu [FOCS 2016] and very recent -size sketches
of Jin, Nelson, and Wu [STACS 2021]
- …