8,409 research outputs found
Tabular Parsing
This is a tutorial on tabular parsing, on the basis of tabulation of
nondeterministic push-down automata. Discussed are Earley's algorithm, the
Cocke-Kasami-Younger algorithm, tabular LR parsing, the construction of parse
trees, and further issues.Comment: 21 pages, 14 figure
Probabilistic Parsing Strategies
We present new results on the relation between purely symbolic context-free
parsing strategies and their probabilistic counter-parts. Such parsing
strategies are seen as constructions of push-down devices from grammars. We
show that preservation of probability distribution is possible under two
conditions, viz. the correct-prefix property and the property of strong
predictiveness. These results generalize existing results in the literature
that were obtained by considering parsing strategies in isolation. From our
general results we also derive negative results on so-called generalized LR
parsing.Comment: 36 pages, 1 figur
A Variant of Earley Parsing
The Earley algorithm is a widely used parsing method in natural language
processing applications. We introduce a variant of Earley parsing that is based
on a ``delayed'' recognition of constituents. This allows us to start the
recognition of a constituent only in cases in which all of its subconstituents
have been found within the input string. This is particularly advantageous in
several cases in which partial analysis of a constituent cannot be completed
and in general in all cases of productions sharing some suffix of their
right-hand sides (even for different left-hand side nonterminals). Although the
two algorithms result in the same asymptotic time and space complexity, from a
practical perspective our algorithm improves the time and space requirements of
the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st
Efficient Tabular LR Parsing
We give a new treatment of tabular LR parsing, which is an alternative to
Tomita's generalized LR algorithm. The advantage is twofold. Firstly, our
treatment is conceptually more attractive because it uses simpler concepts,
such as grammar transformations and standard tabulation techniques also know as
chart parsing. Secondly, the static and dynamic complexity of parsing, both in
space and time, is significantly reduced.Comment: 8 pages, uses aclap.st
Splittability of bilexical context-free grammars is undecidable
Bilexical context-free grammars (2-LCFGs) have proved to be accurate models for statistical natural language parsing. Existing dynamic programming algorithms used to parse sentences under these models have running time of O(|w|^4), where w is the input string. A 2-LCFG is splittable if the left arguments of a lexical head are always independent of the right arguments, and vice versa. When a 2-LCFGs is splittable, parsing time can be asymptotically improved to O(|w|^3). Testing this propertyis therefore of central interest to parsing efficiency. In this article, however, we show the negative result that splittability of 2-LCFGs is undecidable.Publisher PDFPeer reviewe
Computation of distances for regular and context-free probabilistic languages
Several mathematical distances between probabilistic languages have been investigated in the literature, motivated by applications in language modeling, computational biology, syntactic pattern matching and machine learning. In most cases, only pairs of probabilistic regular languages were considered. In this paper we extend the previous results to pairs of languages generated by a probabilistic context-free grammar and a probabilistic finite automaton.PostprintPeer reviewe
Parsing with CYK over Distributed Representations
Syntactic parsing is a key task in natural language processing. This task has
been dominated by symbolic, grammar-based parsers. Neural networks, with their
distributed representations, are challenging these methods. In this article we
show that existing symbolic parsing algorithms can cross the border and be
entirely formulated over distributed representations. To this end we introduce
a version of the traditional Cocke-Younger-Kasami (CYK) algorithm, called
D-CYK, which is entirely defined over distributed representations. Our D-CYK
uses matrix multiplication on real number matrices of size independent of the
length of the input string. These operations are compatible with traditional
neural networks. Experiments show that our D-CYK approximates the original CYK
algorithm. By showing that CYK can be entirely performed on distributed
representations, we open the way to the definition of recurrent layers of
CYK-informed neural networks.Comment: The algorithm has been greatly improved. Experiments have been
redesigne
- …