5 research outputs found
Geometric representations for minimalist grammars
We reformulate minimalist grammars as partial functions on term algebras for
strings and trees. Using filler/role bindings and tensor product
representations, we construct homomorphisms for these data structures into
geometric vector spaces. We prove that the structure-building functions as well
as simple processors for minimalist languages can be realized by piecewise
linear operators in representation space. We also propose harmony, i.e. the
distance of an intermediate processing step from the final well-formed state in
representation space, as a measure of processing complexity. Finally, we
illustrate our findings by means of two particular arithmetic and fractal
representations.Comment: 43 pages, 4 figure
A Case Study of the Convergence of Mildly Context-Sensitive Formalisms for Natural Language Syntax: from Minimalist Grammars to Multiple Context-Free Grammars
Soumis en tant que rapport de recherche INRIA Futurs - Projet SIGNESThe present work is set in the field of natural language syntactic parsing. We present the concept of "mildly context-sensitive" grammar formalisms, which are full-fetched and efficient for syntactic parsing. We summarize a number of these formalisms' definitions, together with the relations between one another, and, most importantly, a survey of known equivalences. The conversion of Edward Stabler's Minimalist Grammars into Multiple Context-Free Grammars (MCFG) is presented in particular detail, along with a study of the complexity of this procedure and of its implications for parsing. This report is an adaptation of the French Master thesis that bears the same name, from Bordeaux 1 University, June 2006
Observations on Strict Derivational Minimalism
Michaelis J. Observations on Strict Derivational Minimalism. Electronic Notes in Theoretical Computer Science. 2004;53:192-209
Observations on Strict Derivational Minimalism
Deviating from the denition originally presented in [12], Stabler [13] introduced inspired by some recent proposals in terms of a minimalist approach to transformational syntaxa (revised) type of a minimalist grammar (MG) as well as a certain type of a strict minimalist grammar (SMG). These two types can be shown to determine the same class of derivable string languages
Wide-coverage statistical parsing with minimalist grammars
Syntactic parsing is the process of automatically assigning a structure to a string
of words, and is arguably a necessary prerequisite for obtaining a detailed and precise
representation of sentence meaning. For many NLP tasks, it is sufficient to use
parsers based on simple context free grammars. However, for tasks in which precision
on certain relatively rare but semantically crucial constructions (such as unbounded
wh-movements for open domain question answering) is important, more expressive
grammatical frameworks still have an important role to play.
One grammatical framework which has been conspicuously absent from journals
and conferences on Natural Language Processing (NLP), despite continuing to dominate
much of theoretical syntax, is Minimalism, the latest incarnation of the Transformational
Grammar (TG) approach to linguistic theory developed very extensively
by Noam Chomsky and many others since the early 1950s. Until now, all parsers
using genuine transformational movement operations have had only narrow coverage
by modern standards, owing to the lack of any wide-coverage TG grammars or treebanks
on which to train statistical models. The received wisdom within NLP is that
TG is too complex and insufficiently formalised to be applied to realistic parsing tasks.
This situation is unfortunate, as it is arguably the most extensively developed syntactic
theory across the greatest number of languages, many of which are otherwise
under-resourced, and yet the vast majority of its insights never find their way into NLP
systems. Conversely, the process of constructing large grammar fragments can have
a salutary impact on the theory itself, forcing choices between competing analyses of
the same construction, and exposing incompatibilities between analyses of different
constructions, along with areas of over- and undergeneration which may otherwise go
unnoticed.
This dissertation builds on research into computational Minimalism pioneered by
Ed Stabler and others since the late 1990s to present the first ever wide-coverage Minimalist
Grammar (MG) parser, along with some promising initial experimental results.
A wide-coverage parser must of course be equipped with a wide-coverage grammar,
and this dissertation will therefore also present the first ever wide-coverage MG, which
has analyses with a high level of cross-linguistic descriptive adequacy for a great many
English constructions, many of which are taken or adapted from proposals in the mainstream
Minimalist literature. The grammar is very deep, in the sense that it describes
many long-range dependencies which even most other expressive wide-coverage grammars
ignore. At the same time, it has also been engineered to be highly constrained,
with continuous computational testing being applied to minimize both under- and over-generation.
Natural language is highly ambiguous, both locally and globally, and even with a
very strong formal grammar, there may still be a great many possible structures for a
given sentence and its substrings. The standard approach to resolving such ambiguity
is to equip the parser with a probability model allowing it to disregard certain unlikely
search paths, thereby increasing both its efficiency and accuracy. The most successful
parsing models are those extracted in a supervised fashion from labelled data in the
form of a corpus of syntactic trees, known as a treebank. Constructing such a treebank
from scratch for a different formalism is extremely time-consuming and expensive,
however, and so the standard approach is to map the trees in an existing treebank into
trees of the target formalism. Minimalist trees are considerably more complex than
those of other formalisms, however, containing many more null heads and movement
operations, making this conversion process far from trivial. This dissertation will describe
a method which has so far been used to convert 56% of the Penn Treebank trees
into MG trees. Although still under development, the resulting MGbank corpus has
already been used to train a statistical A* MG parser, described here, which has an
expected asymptotic time complexity of O(n3); this is much better than even the most
optimistic worst case analysis for the formalism