We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size n representing a text S of length N, our
algorithm computes the LZ78 factorization of T in O(nN+mlogN) time
and O(nN+m) space, where m is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the nN term in the
time and space complexities becomes either nL, where L is the length of the
longest LZ78 factor, or (N−α) where α≥0 is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of S of a certain length. Since m=O(N/logσN) where
σ is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when σ is
constant, and can be more efficient when the text is compressible, i.e. when
m and n are small.Comment: SPIRE 201