16 research outputs found
On Hilberg's Law and Its Links with Guiraud's Law
Hilberg (1990) supposed that finite-order excess entropy of a random human
text is proportional to the square root of the text length. Assuming that
Hilberg's hypothesis is true, we derive Guiraud's law, which states that the
number of word types in a text is greater than proportional to the square root
of the text length. Our derivation is based on some mathematical conjecture in
coding theory and on several experiments suggesting that words can be defined
approximately as the nonterminals of the shortest context-free grammar for the
text. Such operational definition of words can be applied even to texts
deprived of spaces, which do not allow for Mandelbrot's ``intermittent
silence'' explanation of Zipf's and Guiraud's laws. In contrast to
Mandelbrot's, our model assumes some probabilistic long-memory effects in human
narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic