528 research outputs found
Finger Search in Grammar-Compressed Strings
Grammar-based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. Given a grammar, the
random access problem is to compactly represent the grammar while supporting
random access, that is, given a position in the original uncompressed string
report the character at that position. In this paper we study the random access
problem with the finger search property, that is, the time for a random access
query should depend on the distance between a specified index , called the
\emph{finger}, and the query index . We consider both a static variant,
where we first place a finger and subsequently access indices near the finger
efficiently, and a dynamic variant where also moving the finger such that the
time depends on the distance moved is supported.
Let be the size the grammar, and let be the size of the string. For
the static variant we give a linear space representation that supports placing
the finger in time and subsequently accessing in time,
where is the distance between the finger and the accessed index. For the
dynamic variant we give a linear space representation that supports placing the
finger in time and accessing and moving the finger in time. Compared to the best linear space solution to random
access, we improve a query bound to for the static
variant and to for the dynamic variant, while
maintaining linear space. As an application of our results we obtain an
improved solution to the longest common extension problem in grammar compressed
strings. To obtain our results, we introduce several new techniques of
independent interest, including a novel van Emde Boas style decomposition of
grammars
Bicriteria data compression
The advent of massive datasets (and the consequent design of high-performing
distributed storage systems) have reignited the interest of the scientific and
engineering community towards the design of lossless data compressors which
achieve effective compression ratio and very efficient decompression speed.
Lempel-Ziv's LZ77 algorithm is the de facto choice in this scenario because of
its decompression speed and its flexibility in trading decompression speed
versus compressed-space efficiency. Each of the existing implementations offers
a trade-off between space occupancy and decompression speed, so software
engineers have to content themselves by picking the one which comes closer to
the requirements of the application in their hands. Starting from these
premises, and for the first time in the literature, we address in this paper
the problem of trading optimally, and in a principled way, the consumption of
these two resources by introducing the Bicriteria LZ77-Parsing problem, which
formalizes in a principled way what data-compressors have traditionally
approached by means of heuristics. The goal is to determine an LZ77 parsing
which minimizes the space occupancy in bits of the compressed file, provided
that the decompression time is bounded by a fixed amount (or vice-versa). This
way, the software engineer can set its space (or time) requirements and then
derive the LZ77 parsing which optimizes the decompression speed (or the space
occupancy, respectively). We solve this problem efficiently in O(n log^2 n)
time and optimal linear space within a small, additive approximation, by
proving and deploying some specific structural properties of the weighted graph
derived from the possible LZ77-parsings of the input file. The preliminary set
of experiments shows that our novel proposal dominates all the highly
engineered competitors, hence offering a win-win situation in theory&practice
- …