1,147 research outputs found
Practical experiments with regular approximation of context-free languages
Several methods are discussed that construct a finite automaton given a
context-free grammar, including both methods that lead to subsets and those
that lead to supersets of the original context-free language. Some of these
methods of regular approximation are new, and some others are presented here in
a more refined form with respect to existing literature. Practical experiments
with the different methods of regular approximation are performed for
spoken-language input: hypotheses from a speech recognizer are filtered through
a finite automaton.Comment: 28 pages. To appear in Computational Linguistics 26(1), March 200
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process
Although Minimum Distance Parsing (MDP) offers a theoretically attractive
solution to the problem of extragrammaticality, it is often computationally
infeasible in large scale practical applications. In this paper we present an
alternative approach where the labor is distributed between a more restrictive
partial parser and a repair module. Though two stage approaches have grown in
popularity in recent years because of their efficiency, they have done so at
the cost of requiring hand coded repair heuristics. In contrast, our two stage
approach does not require any hand coded knowledge sources dedicated to repair,
thus making it possible to achieve a similar run time advantage over MDP
without losing the quality of domain independence.Comment: 9 pages, 1 Postscript figure, uses aclap.sty and psfig.tex, In
Proceedings of EMNLP 199
Automatic Extraction of Subcategorization from Corpora
We describe a novel technique and implemented system for constructing a
subcategorization dictionary from textual corpora. Each dictionary entry
encodes the relative frequency of occurrence of a comprehensive set of
subcategorization classes for English. An initial experiment, on a sample of 14
verbs which exhibit multiple complementation patterns, demonstrates that the
technique achieves accuracy comparable to previous approaches, which are all
limited to a highly restricted set of subcategorization classes. We also
demonstrate that a subcategorization dictionary built with the system improves
the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9
Probabilistic Parsing Strategies
We present new results on the relation between purely symbolic context-free
parsing strategies and their probabilistic counter-parts. Such parsing
strategies are seen as constructions of push-down devices from grammars. We
show that preservation of probability distribution is possible under two
conditions, viz. the correct-prefix property and the property of strong
predictiveness. These results generalize existing results in the literature
that were obtained by considering parsing strategies in isolation. From our
general results we also derive negative results on so-called generalized LR
parsing.Comment: 36 pages, 1 figur
Happy-GLL: modular, reusable and complete top-down parsers for parameterized nonterminals
Parser generators and parser combinator libraries are the most popular tools
for producing parsers. Parser combinators use the host language to provide
reusable components in the form of higher-order functions with parsers as
parameters. Very few parser generators support this kind of reuse through
abstraction and even fewer generate parsers that are as modular and reusable as
the parts of the grammar for which they are produced. This paper presents a
strategy for generating modular, reusable and complete top-down parsers from
syntax descriptions with parameterized nonterminals, based on the FUN-GLL
variant of the GLL algorithm.
The strategy is discussed and demonstrated as a novel back-end for the Happy
parser generator. Happy grammars can contain `parameterized nonterminals' in
which parameters abstract over grammar symbols, granting an abstraction
mechanism to define reusable grammar operators. However, the existing Happy
back-ends do not deliver on the full potential of parameterized nonterminals as
parameterized nonterminals cannot be reused across grammars. Moreover, the
parser generation process may fail to terminate or may result in exponentially
large parsers generated in an exponential amount of time.
The GLL back-end presented in this paper implements parameterized
nonterminals successfully by generating higher-order functions that resemble
parser combinators, inheriting all the advantages of top-down parsing. The
back-end is capable of generating parsers for the full class of context-free
grammars, generates parsers in linear time and generates parsers that find all
derivations of the input string. To our knowledge, the presented GLL back-end
makes Happy the first parser generator that combines all these features.
This paper describes the translation procedure of the GLL back-end and
compares it to the LALR and GLR back-ends of Happy in several experiments.Comment: 15 page
- …