Search CORE

93 research outputs found

Approximation of grammar-based compression via recompression

Author: Jeż Artur
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet \Sigma of the input string can be identified with numbers from {1, ..., N^c} for some constant c. Otherwise, additional cost of O(n log|\Sigma|) is needed. Algorithms with such approximation guarantees and running time are known, the novelty of this paper is a particular simplicity of the algorithm as well as the analysis of the algorithm, which uses a general technique of recompression recently introduced by the author. Furthermore, contrary to the previous results, this work does not use the LZ representation of the input string in the construction, nor in the analysis.Comment: 22 pages, some many small improvements, to be submited to a journa

arXiv.org e-Print Archive

MPG.PuRe

A really simple approximation of smallest grammar

Author: A. Jeż
F. Rubin
H. Sakamoto
J. Kärkkäinen
M. Charikar
M. Lohrey
W. Rytter
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified with numbers from 1,ldots, N^c for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional O(l) letters, where l was the size of the original LZ77 factorisation. In one phase in a greedy way (by a left-to-right sweep and a help of the factorisation) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are O(l) many different pairs among them. Hence there are O(log N) phases, each of them introduces O(l) nonterminals to a grammar. A more precise analysis yields a bound O(l log(N/l)). As l \leq g, this yields the desired bound O(g log(N/g)).Comment: Accepted for CPM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

Approximation of smallest linear tree grammar

Author: Jez Artur
Lohrey Markus
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014)
Publication date: 01/01/2014
Field of study

A simple linear-time algorithm for constructing a linear context-free tree grammar of size O(r^2.g.log(n)) for a given input tree T of size n is presented, where g is the size of a minimal linear context-free tree grammar for T, and r is the maximal rank of symbols in T (which is a constant in many applications). This is the first example of a grammar-based tree compression algorithm with an approximation ratio polynomial in g. The analysis of the algorithm uses an extension of the recompression technique (used in the context of grammar-based string compression) from strings to trees

Dagstuhl Research Online Publication Server

MPG.PuRe

Context unification is in PSPACE

Author: A. Gascón
A. Jeż
H. Comon
H. Comon
J. Levy
J. Levy
J. Levy
M. Schmidt-Schauß
M. Schmidt-Schauß
M. Schmidt-Schauß
M. Schmidt-Schauß
W. Plandowski
W. Plandowski
Publication venue
Publication date: 08/11/2013
Field of study

Contexts are terms with one `hole', i.e. a place in which we can substitute an argument. In context unification we are given an equation over terms with variables representing contexts and ask about the satisfiability of this equation. Context unification is a natural subvariant of second-order unification, which is undecidable, and a generalization of word equations, which are decidable, at the same time. It is the unique problem between those two whose decidability is uncertain (for already almost two decades). In this paper we show that the context unification is in PSPACE. The result holds under a (usual) assumption that the first-order signature is finite. This result is obtained by an extension of the recompression technique, recently developed by the author and used in particular to obtain a new PSPACE algorithm for satisfiability of word equations, to context unification. The recompression is based on performing simple compression rules (replacing pairs of neighbouring function symbols), which are (conceptually) applied on the solution of the context equation and modifying the equation in a way so that such compression steps can be in fact performed directly on the equation, without the knowledge of the actual solution.Comment: 27 pages, submitted, small notation changes and small improvements over the previous tex

arXiv.org e-Print Archive

CiteSeerX

Crossref

MPG.PuRe

Longest Common Extensions with Recompression

Author: I Tomohiro
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 20/11/2016
Field of study

Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

One-variable word equations in linear time

Author: A. Jeż
A. Jeż
A. Jeż
G.S. Makanin
H. Sakamoto
J. Kärkkäinen
K. Mehlhorn
M. Laine
M. Lohrey
O. Berkman
R. Dąbrowski
S.E. Obono
T. Kasai
W. Plandowski
W. Plandowski
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we consider word equations with one variable (and arbitrary many appearances of it). A recent technique of recompression, which is applicable to general word equations, is shown to be suitable also in this case. While in general case it is non-deterministic, it determinises in case of one variable and the obtained running time is O(n + #_X log n), where #_X is the number of appearances of the variable in the equation. This matches the previously-best algorithm due to D\k{a}browski and Plandowski. Then, using a couple of heuristics as well as more detailed time analysis the running time is lowered to O(n) in RAM model. Unfortunately no new properties of solutions are shown.Comment: submitted to a journal, general overhaul over the previous versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

MPG.PuRe