5 research outputs found
Precise n-gram Probabilities from Stochastic Context-free Grammars
We present an algorithm for computing n-gram probabilities from stochastic
context-free grammars, a procedure that can alleviate some of the standard
problems associated with n-grams (estimation from sparse data, lack of
linguistic structure, among others). The method operates via the computation of
substring expectations, which in turn is accomplished by solving systems of
linear equations derived from the grammar. We discuss efficient implementation
of the algorithm and report our practical experience with it.Comment: 12 pages, to appear in ACL-9
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
Analyzing and Improving Statistical Language Models for Speech Recognition
In many current speech recognizers, a statistical language model is used to
indicate how likely it is that a certain word will be spoken next, given the
words recognized so far. How can statistical language models be improved so
that more complex speech recognition tasks can be tackled? Since the knowledge
of the weaknesses of any theory often makes improving the theory easier, the
central idea of this thesis is to analyze the weaknesses of existing
statistical language models in order to subsequently improve them. To that end,
we formally define a weakness of a statistical language model in terms of the
logarithm of the total probability, LTP, a term closely related to the standard
perplexity measure used to evaluate statistical language models. We apply our
definition of a weakness to a frequently used statistical language model,
called a bi-pos model. This results, for example, in a new modeling of unknown
words which improves the performance of the model by 14% to 21%. Moreover, one
of the identified weaknesses has prompted the development of our generalized
N-pos language model, which is also outlined in this thesis. It can incorporate
linguistic knowledge even if it extends over many words and this is not
feasible in a traditional N-pos model. This leads to a discussion of
whatknowledge should be added to statistical language models in general and we
give criteria for selecting potentially useful knowledge. These results show
the usefulness of both our definition of a weakness and of performing an
analysis of weaknesses of statistical language models in general.Comment: 140 pages, postscript, approx 500KB, if problems with delivery, mail
to [email protected]
Computation of Probabilities for an Island-Driven Parser
The authors describe an effort to adapt island-driven parsers to handle stochastic context-free grammars. These grammars could be used as language models (LMs) by a language processor (LP) to computer the probability of a linguistic interpretation. As different islands may compete for growth, it is important to compute the probability that an LM generates a sentence containing islands and gaps between them. Algorithms for computing these probabilities are introduced. The complexity of these algorithms is analyzed both from theoretical and practical points of view. It is shown that the computation of probabilities in the presence of gaps of unknown length requires the impractical solution of a nonlinear system of equations, whereas the computation of probabilities for cases with gaps containing a known number of unknown words has polynomial time complexity and is practically feasible. The use of the results obtained in automatic speech understanding systems is discussed