541 research outputs found
Approximation of grammar-based compression via recompression
In this paper we present a simple linear-time algorithm constructing a
context-free grammar of size O(g log(N/g)) for the input string, where N is the
size of the input string and g the size of the optimal grammar generating this
string. The algorithm works for arbitrary size alphabets, but the running time
is linear assuming that the alphabet \Sigma of the input string can be
identified with numbers from {1, ..., N^c} for some constant c. Otherwise,
additional cost of O(n log|\Sigma|) is needed.
Algorithms with such approximation guarantees and running time are known, the
novelty of this paper is a particular simplicity of the algorithm as well as
the analysis of the algorithm, which uses a general technique of recompression
recently introduced by the author. Furthermore, contrary to the previous
results, this work does not use the LZ representation of the input string in
the construction, nor in the analysis.Comment: 22 pages, some many small improvements, to be submited to a journa
Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P)
In this paper, a compressed membership problem for finite automata, both
deterministic and non-deterministic, with compressed transition labels is
studied. The compression is represented by straight-line programs (SLPs), i.e.
context-free grammars generating exactly one string. A novel technique of
dealing with SLPs is introduced: the SLPs are recompressed, so that substrings
of the input text are encoded in SLPs labelling the transitions of the NFA
(DFA) in the same way, as in the SLP representing the input text. To this end,
the SLPs are locally decompressed and then recompressed in a uniform way.
Furthermore, such recompression induces only small changes in the automaton, in
particular, the size of the automaton remains polynomial.
Using this technique it is shown that the compressed membership for NFA with
compressed labels is in NP, thus confirming the conjecture of Plandowski and
Rytter and extending the partial result of Lohrey and Mathissen; as it is
already known, that this problem is NP-hard, we settle its exact computational
complexity. Moreover, the same technique applied to the compressed membership
for DFA with compressed labels yields that this problem is in P; for this
problem, only trivial upper-bound PSPACE was known
A really simple approximation of smallest grammar
In this paper we present a really simple linear-time algorithm constructing a
context-free grammar of size O(g log (N/g)) for the input string, where N is
the size of the input string and g the size of the optimal grammar generating
this string. The algorithm works for arbitrary size alphabets, but the running
time is linear assuming that the alphabet Sigma of the input string can be
identified with numbers from 1,ldots, N^c for some constant c. Algorithms with
such an approximation guarantee and running time are known, however all of them
were non-trivial and their analyses were involved. The here presented algorithm
computes the LZ77 factorisation and transforms it in phases to a grammar. In
each phase it maintains an LZ77-like factorisation of the word with at most l
factors as well as additional O(l) letters, where l was the size of the
original LZ77 factorisation. In one phase in a greedy way (by a left-to-right
sweep and a help of the factorisation) we choose a set of pairs of consecutive
letters to be replaced with new symbols, i.e. nonterminals of the constructed
grammar. We choose at least 2/3 of the letters in the word and there are O(l)
many different pairs among them. Hence there are O(log N) phases, each of them
introduces O(l) nonterminals to a grammar. A more precise analysis yields a
bound O(l log(N/l)). As l \leq g, this yields the desired bound O(g log(N/g)).Comment: Accepted for CPM 201
High-frequency asymptotic compression of dense BEM matrices for general geometries without ray tracing
Wave propagation and scattering problems in acoustics are often solved with
boundary element methods. They lead to a discretization matrix that is
typically dense and large: its size and condition number grow with increasing
frequency. Yet, high frequency scattering problems are intrinsically local in
nature, which is well represented by highly localized rays bouncing around.
Asymptotic methods can be used to reduce the size of the linear system, even
making it frequency independent, by explicitly extracting the oscillatory
properties from the solution using ray tracing or analogous techniques.
However, ray tracing becomes expensive or even intractable in the presence of
(multiple) scattering obstacles with complicated geometries. In this paper, we
start from the same discretization that constructs the fully resolved large and
dense matrix, and achieve asymptotic compression by explicitly localizing the
Green's function instead. This results in a large but sparse matrix, with a
faster associated matrix-vector product and, as numerical experiments indicate,
a much improved condition number. Though an appropriate localisation of the
Green's function also depends on asymptotic information unavailable for general
geometries, we can construct it adaptively in a frequency sweep from small to
large frequencies in a way which automatically takes into account a general
incident wave. We show that the approach is robust with respect to non-convex,
multiple and even near-trapping domains, though the compression rate is clearly
lower in the latter case. Furthermore, in spite of its asymptotic nature, the
method is robust with respect to low-order discretizations such as piecewise
constants, linears or cubics, commonly used in applications. On the other hand,
we do not decrease the total number of degrees of freedom compared to a
conventional classical discretization. The combination of the ...Comment: 24 pages, 13 figure
Efficient LZ78 factorization of grammar compressed text
We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size representing a text of length , our
algorithm computes the LZ78 factorization of in time
and space, where is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the term in the
time and space complexities becomes either , where is the length of the
longest LZ78 factor, or where is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of of a certain length. Since where
is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when is
constant, and can be more efficient when the text is compressible, i.e. when
and are small.Comment: SPIRE 201
Linear Compressed Pattern Matching for Polynomial Rewriting (Extended Abstract)
This paper is an extended abstract of an analysis of term rewriting where the
terms in the rewrite rules as well as the term to be rewritten are compressed
by a singleton tree grammar (STG). This form of compression is more general
than node sharing or representing terms as dags since also partial trees
(contexts) can be shared in the compression. In the first part efficient but
complex algorithms for detecting applicability of a rewrite rule under
STG-compression are constructed and analyzed. The second part applies these
results to term rewriting sequences.
The main result for submatching is that finding a redex of a left-linear rule
can be performed in polynomial time under STG-compression.
The main implications for rewriting and (single-position or parallel)
rewriting steps are: (i) under STG-compression, n rewriting steps can be
performed in nondeterministic polynomial time. (ii) under STG-compression and
for left-linear rewrite rules a sequence of n rewriting steps can be performed
in polynomial time, and (iii) for compressed rewrite rules where the left hand
sides are either DAG-compressed or ground and STG-compressed, and an
STG-compressed target term, n rewriting steps can be performed in polynomial
time.Comment: In Proceedings TERMGRAPH 2013, arXiv:1302.599
Longest Common Extensions with Recompression
Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers
- …