26,944 research outputs found
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
PonyGE2: Grammatical Evolution in Python
Grammatical Evolution (GE) is a population-based evolutionary algorithm,
where a formal grammar is used in the genotype to phenotype mapping process.
PonyGE2 is an open source implementation of GE in Python, developed at UCD's
Natural Computing Research and Applications group. It is intended as an
advertisement and a starting-point for those new to GE, a reference for
students and researchers, a rapid-prototyping medium for our own experiments,
and a Python workout. As well as providing the characteristic genotype to
phenotype mapping of GE, a search algorithm engine is also provided. A number
of sample problems and tutorials on how to use and adapt PonyGE2 have been
developed.Comment: 8 pages, 4 figures, submitted to the 2017 GECCO Workshop on
Evolutionary Computation Software Systems (EvoSoft
Lexicalization and Grammar Development
In this paper we present a fully lexicalized grammar formalism as a
particularly attractive framework for the specification of natural language
grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining
Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We
illustrate the advantages of lexicalized grammars in various contexts of
natural language processing, ranging from wide-coverage grammar development to
parsing and machine translation. We also present a method for compact and
efficient representation of lexicalized trees.Comment: ps file. English w/ German abstract. 10 page
A Processing Model for Free Word Order Languages
Like many verb-final languages, Germn displays considerable word-order
freedom: there is no syntactic constraint on the ordering of the nominal
arguments of a verb, as long as the verb remains in final position. This effect
is referred to as ``scrambling'', and is interpreted in transformational
frameworks as leftward movement of the arguments. Furthermore, arguments from
an embedded clause may move out of their clause; this effect is referred to as
``long-distance scrambling''. While scrambling has recently received
considerable attention in the syntactic literature, the status of long-distance
scrambling has only rarely been addressed. The reason for this is the
problematic status of the data: not only is long-distance scrambling highly
dependent on pragmatic context, it also is strongly subject to degradation due
to processing constraints. As in the case of center-embedding, it is not
immediately clear whether to assume that observed unacceptability of highly
complex sentences is due to grammatical restrictions, or whether we should
assume that the competence grammar does not place any restrictions on
scrambling (and that, therefore, all such sentences are in fact grammatical),
and the unacceptability of some (or most) of the grammatically possible word
orders is due to processing limitations. In this paper, we will argue for the
second view by presenting a processing model for German.Comment: 23 pages, uuencoded compressed ps file. In {\em Perspectives on
Sentence Processing}, C. Clifton, Jr., L. Frazier and K. Rayner, editors.
Lawrence Erlbaum Associates, 199
An Alternative Conception of Tree-Adjoining Derivation
The precise formulation of derivation for tree-adjoining grammars has
important ramifications for a wide variety of uses of the formalism, from
syntactic analysis to semantic interpretation and statistical language
modeling. We argue that the definition of tree-adjoining derivation must be
reformulated in order to manifest the proper linguistic dependencies in
derivations. The particular proposal is both precisely characterizable through
a definition of TAG derivations as equivalence classes of ordered derivation
trees, and computationally operational, by virtue of a compilation to linear
indexed grammars together with an efficient algorithm for recognition and
parsing according to the compiled grammar.Comment: 33 page
- …