93 research outputs found
Approximation of grammar-based compression via recompression
In this paper we present a simple linear-time algorithm constructing a
context-free grammar of size O(g log(N/g)) for the input string, where N is the
size of the input string and g the size of the optimal grammar generating this
string. The algorithm works for arbitrary size alphabets, but the running time
is linear assuming that the alphabet \Sigma of the input string can be
identified with numbers from {1, ..., N^c} for some constant c. Otherwise,
additional cost of O(n log|\Sigma|) is needed.
Algorithms with such approximation guarantees and running time are known, the
novelty of this paper is a particular simplicity of the algorithm as well as
the analysis of the algorithm, which uses a general technique of recompression
recently introduced by the author. Furthermore, contrary to the previous
results, this work does not use the LZ representation of the input string in
the construction, nor in the analysis.Comment: 22 pages, some many small improvements, to be submited to a journa
A really simple approximation of smallest grammar
In this paper we present a really simple linear-time algorithm constructing a
context-free grammar of size O(g log (N/g)) for the input string, where N is
the size of the input string and g the size of the optimal grammar generating
this string. The algorithm works for arbitrary size alphabets, but the running
time is linear assuming that the alphabet Sigma of the input string can be
identified with numbers from 1,ldots, N^c for some constant c. Algorithms with
such an approximation guarantee and running time are known, however all of them
were non-trivial and their analyses were involved. The here presented algorithm
computes the LZ77 factorisation and transforms it in phases to a grammar. In
each phase it maintains an LZ77-like factorisation of the word with at most l
factors as well as additional O(l) letters, where l was the size of the
original LZ77 factorisation. In one phase in a greedy way (by a left-to-right
sweep and a help of the factorisation) we choose a set of pairs of consecutive
letters to be replaced with new symbols, i.e. nonterminals of the constructed
grammar. We choose at least 2/3 of the letters in the word and there are O(l)
many different pairs among them. Hence there are O(log N) phases, each of them
introduces O(l) nonterminals to a grammar. A more precise analysis yields a
bound O(l log(N/l)). As l \leq g, this yields the desired bound O(g log(N/g)).Comment: Accepted for CPM 201
Approximation of smallest linear tree grammar
A simple linear-time algorithm for constructing a linear context-free tree grammar of size O(r^2.g.log(n)) for a given input tree T of size n is presented, where g is the size of a minimal linear context-free tree grammar for T, and r is the maximal rank of symbols in T (which is a constant in many applications). This is the first example of a grammar-based tree compression algorithm with an approximation ratio polynomial in g. The analysis of the algorithm uses an extension of the recompression technique (used in the context of grammar-based string compression) from strings to trees
Context unification is in PSPACE
Contexts are terms with one `hole', i.e. a place in which we can substitute
an argument. In context unification we are given an equation over terms with
variables representing contexts and ask about the satisfiability of this
equation. Context unification is a natural subvariant of second-order
unification, which is undecidable, and a generalization of word equations,
which are decidable, at the same time. It is the unique problem between those
two whose decidability is uncertain (for already almost two decades). In this
paper we show that the context unification is in PSPACE. The result holds under
a (usual) assumption that the first-order signature is finite.
This result is obtained by an extension of the recompression technique,
recently developed by the author and used in particular to obtain a new PSPACE
algorithm for satisfiability of word equations, to context unification. The
recompression is based on performing simple compression rules (replacing pairs
of neighbouring function symbols), which are (conceptually) applied on the
solution of the context equation and modifying the equation in a way so that
such compression steps can be in fact performed directly on the equation,
without the knowledge of the actual solution.Comment: 27 pages, submitted, small notation changes and small improvements
over the previous tex
Longest Common Extensions with Recompression
Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers
One-variable word equations in linear time
In this paper we consider word equations with one variable (and arbitrary
many appearances of it). A recent technique of recompression, which is
applicable to general word equations, is shown to be suitable also in this
case. While in general case it is non-deterministic, it determinises in case of
one variable and the obtained running time is O(n + #_X log n), where #_X is
the number of appearances of the variable in the equation. This matches the
previously-best algorithm due to D\k{a}browski and Plandowski. Then, using a
couple of heuristics as well as more detailed time analysis the running time is
lowered to O(n) in RAM model. Unfortunately no new properties of solutions are
shown.Comment: submitted to a journal, general overhaul over the previous versio
- …