4,107 research outputs found
Practical experiments with regular approximation of context-free languages
Several methods are discussed that construct a finite automaton given a
context-free grammar, including both methods that lead to subsets and those
that lead to supersets of the original context-free language. Some of these
methods of regular approximation are new, and some others are presented here in
a more refined form with respect to existing literature. Practical experiments
with the different methods of regular approximation are performed for
spoken-language input: hypotheses from a speech recognizer are filtered through
a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200
On Hilberg's Law and Its Links with Guiraud's Law
Hilberg (1990) supposed that finite-order excess entropy of a random human
text is proportional to the square root of the text length. Assuming that
Hilberg's hypothesis is true, we derive Guiraud's law, which states that the
number of word types in a text is greater than proportional to the square root
of the text length. Our derivation is based on some mathematical conjecture in
coding theory and on several experiments suggesting that words can be defined
approximately as the nonterminals of the shortest context-free grammar for the
text. Such operational definition of words can be applied even to texts
deprived of spaces, which do not allow for Mandelbrot's ``intermittent
silence'' explanation of Zipf's and Guiraud's laws. In contrast to
Mandelbrot's, our model assumes some probabilistic long-memory effects in human
narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic
Finite Automata for the Sub- and Superword Closure of CFLs: Descriptional and Computational Complexity
We answer two open questions by (Gruber, Holzer, Kutrib, 2009) on the
state-complexity of representing sub- or superword closures of context-free
grammars (CFGs): (1) We prove a (tight) upper bound of on
the size of nondeterministic finite automata (NFAs) representing the subword
closure of a CFG of size . (2) We present a family of CFGs for which the
minimal deterministic finite automata representing their subword closure
matches the upper-bound of following from (1).
Furthermore, we prove that the inequivalence problem for NFAs representing sub-
or superword-closed languages is only NP-complete as opposed to PSPACE-complete
for general NFAs. Finally, we extend our results into an approximation method
to attack inequivalence problems for CFGs
- …