3,528 research outputs found
Probabilistic Parsing Strategies
We present new results on the relation between purely symbolic context-free
parsing strategies and their probabilistic counter-parts. Such parsing
strategies are seen as constructions of push-down devices from grammars. We
show that preservation of probability distribution is possible under two
conditions, viz. the correct-prefix property and the property of strong
predictiveness. These results generalize existing results in the literature
that were obtained by considering parsing strategies in isolation. From our
general results we also derive negative results on so-called generalized LR
parsing.Comment: 36 pages, 1 figur
Efficient Tabular LR Parsing
We give a new treatment of tabular LR parsing, which is an alternative to
Tomita's generalized LR algorithm. The advantage is twofold. Firstly, our
treatment is conceptually more attractive because it uses simpler concepts,
such as grammar transformations and standard tabulation techniques also know as
chart parsing. Secondly, the static and dynamic complexity of parsing, both in
space and time, is significantly reduced.Comment: 8 pages, uses aclap.st
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
LHIP: Extended DCGs for Configurable Robust Parsing
We present LHIP, a system for incremental grammar development using an
extended DCG formalism. The system uses a robust island-based parsing method
controlled by user-defined performance thresholds.Comment: 10 pages, in Proc. Coling9
Efficient Algorithms for Parsing the DOP Model
Excellent results have been reported for Data-Oriented Parsing (DOP) of
natural language texts (Bod, 1993). Unfortunately, existing algorithms are both
computationally intensive and difficult to implement. Previous algorithms are
expensive due to two factors: the exponential number of rules that must be
generated and the use of a Monte Carlo parsing algorithm. In this paper we
solve the first problem by a novel reduction of the DOP model to a small,
equivalent probabilistic context-free grammar. We solve the second problem by a
novel deterministic parsing strategy that maximizes the expected number of
correct constituents, rather than the probability of a correct parse tree.
Using the optimizations, experiments yield a 97% crossing brackets rate and 88%
zero crossing brackets rate. This differs significantly from the results
reported by Bod, and is comparable to results from a duplication of Pereira and
Schabes's (1992) experiment on the same data. We show that Bod's results are at
least partially due to an extremely fortuitous choice of test data, and
partially due to using cleaner data than other researchers.Comment: 10 page
- …