7,710 research outputs found
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
A Variant of Earley Parsing
The Earley algorithm is a widely used parsing method in natural language
processing applications. We introduce a variant of Earley parsing that is based
on a ``delayed'' recognition of constituents. This allows us to start the
recognition of a constituent only in cases in which all of its subconstituents
have been found within the input string. This is particularly advantageous in
several cases in which partial analysis of a constituent cannot be completed
and in general in all cases of productions sharing some suffix of their
right-hand sides (even for different left-hand side nonterminals). Although the
two algorithms result in the same asymptotic time and space complexity, from a
practical perspective our algorithm improves the time and space requirements of
the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st
Probabilistic Parsing Strategies
We present new results on the relation between purely symbolic context-free
parsing strategies and their probabilistic counter-parts. Such parsing
strategies are seen as constructions of push-down devices from grammars. We
show that preservation of probability distribution is possible under two
conditions, viz. the correct-prefix property and the property of strong
predictiveness. These results generalize existing results in the literature
that were obtained by considering parsing strategies in isolation. From our
general results we also derive negative results on so-called generalized LR
parsing.Comment: 36 pages, 1 figur
Three New Probabilistic Models for Dependency Parsing: An Exploration
After presenting a novel O(n^3) parsing algorithm for dependency grammar, we
develop three contrasting ways to stochasticize it. We propose (a) a lexical
affinity model where words struggle to modify each other, (b) a sense tagging
model where words fluctuate randomly in their selectional preferences, and (c)
a generative model where the speaker fleshes out each word's syntactic and
conceptual structure without regard to the implications for the hearer. We also
give preliminary empirical results from evaluating the three models' parsing
performance on annotated Wall Street Journal training text (derived from the
Penn Treebank). In these results, the generative (i.e., top-down) model
performs significantly better than the others, and does about equally well at
assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty
and acl.bs
An Efficient Implementation of the Head-Corner Parser
This paper describes an efficient and robust implementation of a
bi-directional, head-driven parser for constraint-based grammars. This parser
is developed for the OVIS system: a Dutch spoken dialogue system in which
information about public transport can be obtained by telephone.
After a review of the motivation for head-driven parsing strategies, and
head-corner parsing in particular, a non-deterministic version of the
head-corner parser is presented. A memoization technique is applied to obtain a
fast parser. A goal-weakening technique is introduced which greatly improves
average case efficiency, both in terms of speed and space requirements.
I argue in favor of such a memoization strategy with goal-weakening in
comparison with ordinary chart-parsers because such a strategy can be applied
selectively and therefore enormously reduces the space requirements of the
parser, while no practical loss in time-efficiency is observed. On the
contrary, experiments are described in which head-corner and left-corner
parsers implemented with selective memoization and goal weakening outperform
`standard' chart parsers. The experiments include the grammar of the OVIS
system and the Alvey NL Tools grammar.
Head-corner parsing is a mix of bottom-up and top-down processing. Certain
approaches towards robust parsing require purely bottom-up processing.
Therefore, it seems that head-corner parsing is unsuitable for such robust
parsing techniques. However, it is shown how underspecification (which arises
very naturally in a logic programming environment) can be used in the
head-corner parser to allow such robust parsing techniques. A particular robust
parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st
Measuring efficiency in high-accuracy, broad-coverage statistical parsing
Very little attention has been paid to the comparison of efficiency between
high accuracy statistical parsers. This paper proposes one machine-independent
metric that is general enough to allow comparisons across very different
parsing architectures. This metric, which we call ``events considered'',
measures the number of ``events'', however they are defined for a particular
parser, for which a probability must be calculated, in order to find the parse.
It is applicable to single-pass or multi-stage parsers. We discuss the
advantages of the metric, and demonstrate its usefulness by using it to compare
two parsers which differ in several fundamental ways.Comment: 8 pages, 4 figures, 2 table
- …