Search CORE

1 research outputs found

Grammar compression with probabilistic context-free grammar

Author: Hendrian Diptarama
Kobayashi Naoki
Naganuma Hiroaki
Shinohara Ayumi
Yoshinaka Ryo
Publication venue
Publication date: 18/03/2020
Field of study

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string

T

has been compressed as a context-free grammar

G

in Chomsky normal form satisfying

L(G) = \{T\}

. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar

G

that generates

T

, but not necessarily as a unique element of

L(G)

. In order to recover the original text

T

unambiguously, we keep both the grammar

G

and the derivation tree of

T

from the start symbol in

G

, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.Comment: 11 pages, 3 figures, accepted for poster presentation at DCC 202

arXiv.org e-Print Archive