805 research outputs found
An Alternative Conception of Tree-Adjoining Derivation
The precise formulation of derivation for tree-adjoining grammars has
important ramifications for a wide variety of uses of the formalism, from
syntactic analysis to semantic interpretation and statistical language
modeling. We argue that the definition of tree-adjoining derivation must be
reformulated in order to manifest the proper linguistic dependencies in
derivations. The particular proposal is both precisely characterizable through
a definition of TAG derivations as equivalence classes of ordered derivation
trees, and computationally operational, by virtue of a compilation to linear
indexed grammars together with an efficient algorithm for recognition and
parsing according to the compiled grammar.Comment: 33 page
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
Evaluation of LTAG parsing with supertag compaction
One of the biggest concerns that has been raised over the feasibility of using large-scale LTAGs in NLP is the amount of redundancy within a grammarĀæs elementary tree set. This has led to various proposals on how best to represent grammars in a way that makes them compact and easily maintained (Vijay-Shanker and Schabes, 1992; Becker, 1993; Becker, 1994; Evans, Gazdar and Weir, 1995; Candito, 1996). Unfortunately, while this work can help to make the storage of grammars more efficient, it does nothing to prevent the problem reappearing when the grammar is processed by a parser and the complete set of trees is reproduced. In this paper we are concerned with an approach that addresses this problem of computational redundancy in the trees, and evaluate its effectiveness
Incremental Parser Generation for Tree Adjoining Grammars
This paper describes the incremental generation of parse tables for the
LR-type parsing of Tree Adjoining Languages (TALs). The algorithm presented
handles modifications to the input grammar by updating the parser generated so
far. In this paper, a lazy generation of LR-type parsers for TALs is defined in
which parse tables are created by need while parsing. We then describe an
incremental parser generator for TALs which responds to modification of the
input grammar by updating parse tables built so far.Comment: 12 pages, 12 Postscript figures, uses fullname.st
Interaction Grammars
Interaction Grammar (IG) is a grammatical formalism based on the notion of
polarity. Polarities express the resource sensitivity of natural languages by
modelling the distinction between saturated and unsaturated syntactic
structures. Syntactic composition is represented as a chemical reaction guided
by the saturation of polarities. It is expressed in a model-theoretic framework
where grammars are constraint systems using the notion of tree description and
parsing appears as a process of building tree description models satisfying
criteria of saturation and minimality
Global Thresholding and Multiple Pass Parsing
We present a variation on classic beam thresholding techniques that is up to
an order of magnitude faster than the traditional method, at the same
performance level. We also present a new thresholding technique, global
thresholding, which, combined with the new beam thresholding, gives an
additional factor of two improvement, and a novel technique, multiple pass
parsing, that can be combined with the others to yield yet another 50%
improvement. We use a new search algorithm to simultaneously optimize the
thresholding parameters of the various algorithms.Comment: Fixed latex errors; fixed minor errors in published versio
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
- ā¦