1,440 research outputs found

    On Hilberg's Law and Its Links with Guiraud's Law

    Full text link
    Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's law, which states that the number of word types in a text is greater than proportional to the square root of the text length. Our derivation is based on some mathematical conjecture in coding theory and on several experiments suggesting that words can be defined approximately as the nonterminals of the shortest context-free grammar for the text. Such operational definition of words can be applied even to texts deprived of spaces, which do not allow for Mandelbrot's ``intermittent silence'' explanation of Zipf's and Guiraud's laws. In contrast to Mandelbrot's, our model assumes some probabilistic long-memory effects in human narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic

    Practical experiments with regular approximation of context-free languages

    Get PDF
    Several methods are discussed that construct a finite automaton given a context-free grammar, including both methods that lead to subsets and those that lead to supersets of the original context-free language. Some of these methods of regular approximation are new, and some others are presented here in a more refined form with respect to existing literature. Practical experiments with the different methods of regular approximation are performed for spoken-language input: hypotheses from a speech recognizer are filtered through a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200

    Toward a balanced grammatical description

    Get PDF
    The writer of a grammatical description attempts to accomplish many goals in one complex document. Some of these goals seem to conflict with one another, thus causing tension, discouragement and paralysis for many descriptive linguists. For example, all grammar writers want their work to speak clearly to general linguists and to specialists in their language area tradition. Yet a grammar that addresses universal issues, may not be detailed enough for specialists; while a highly detailed description written in a specialized areal framework may be incomprehensible to those outside of a particular tradition. In the present chapter, I describe four tensions that grammar writers often face, and provide concrete suggestions on how to balance these tensions effectively and creatively. These tensions are: • Comprehensiveness vs. usefulness. • Technical accuracy vs. understandability. • Universality vs. specificity. • A ‘form-driven’ vs. a ‘function-driven’ approach. By drawing attention to these potential conflicts, I hope to help free junior linguists from the unrealistic expectation that their work must fully accomplish all of the ideals that motivate the complex task of describing the grammar of a language. The goal of a description grammar is to produce an esthetically pleasing, intellectually stimulating, and genuinely informative piece of work.National Foreign Language Resource Cente
    corecore