Search CORE

2 research outputs found

Online Grammar Compression for Frequent Pattern Discovery

Author: Fukunaga Shouhei
Sakamoto Hiroshi
Takabatake Yoshimasa
Tomohiro I
Publication venue
Publication date: 30/08/2016
Field of study

Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio

\Omega(\frac{1}{\lg^2m})

\Omega(\frac{1}{\lg^*N\lg m})

where

m

is the length of an optimal pattern in a string of length

N

and

\lg^*

is the iteration of the logarithm base

2

. For a sufficiently large

N

\lg^*N

is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.Comment: 14 page