3,446 research outputs found
Multiple Context-Free Tree Grammars: Lexicalization and Characterization
Multiple (simple) context-free tree grammars are investigated, where "simple"
means "linear and nondeleting". Every multiple context-free tree grammar that
is finitely ambiguous can be lexicalized; i.e., it can be transformed into an
equivalent one (generating the same tree language) in which each rule of the
grammar contains a lexical symbol. Due to this transformation, the rank of the
nonterminals increases at most by 1, and the multiplicity (or fan-out) of the
grammar increases at most by the maximal rank of the lexical symbols; in
particular, the multiplicity does not increase when all lexical symbols have
rank 0. Multiple context-free tree grammars have the same tree generating power
as multi-component tree adjoining grammars (provided the latter can use a
root-marker). Moreover, every multi-component tree adjoining grammar that is
finitely ambiguous can be lexicalized. Multiple context-free tree grammars have
the same string generating power as multiple context-free (string) grammars and
polynomial time parsing algorithms. A tree language can be generated by a
multiple context-free tree grammar if and only if it is the image of a regular
tree language under a deterministic finite-copying macro tree transducer.
Multiple context-free tree grammars can be used as a synchronous translation
device.Comment: 78 pages, 13 figure
Colored operads, series on colored operads, and combinatorial generating systems
We introduce bud generating systems, which are used for combinatorial
generation. They specify sets of various kinds of combinatorial objects, called
languages. They can emulate context-free grammars, regular tree grammars, and
synchronous grammars, allowing us to work with all these generating systems in
a unified way. The theory of bud generating systems uses colored operads.
Indeed, an object is generated by a bud generating system if it satisfies a
certain equation in a colored operad. To compute the generating series of the
languages of bud generating systems, we introduce formal power series on
colored operads and several operations on these. Series on colored operads are
crucial to express the languages specified by bud generating systems and allow
us to enumerate combinatorial objects with respect to some statistics. Some
examples of bud generating systems are constructed; in particular to specify
some sorts of balanced trees and to obtain recursive formulas enumerating
these.Comment: 48 page
An Alternative Conception of Tree-Adjoining Derivation
The precise formulation of derivation for tree-adjoining grammars has
important ramifications for a wide variety of uses of the formalism, from
syntactic analysis to semantic interpretation and statistical language
modeling. We argue that the definition of tree-adjoining derivation must be
reformulated in order to manifest the proper linguistic dependencies in
derivations. The particular proposal is both precisely characterizable through
a definition of TAG derivations as equivalence classes of ordered derivation
trees, and computationally operational, by virtue of a compilation to linear
indexed grammars together with an efficient algorithm for recognition and
parsing according to the compiled grammar.Comment: 33 page
Lexicalization and Grammar Development
In this paper we present a fully lexicalized grammar formalism as a
particularly attractive framework for the specification of natural language
grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining
Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We
illustrate the advantages of lexicalized grammars in various contexts of
natural language processing, ranging from wide-coverage grammar development to
parsing and machine translation. We also present a method for compact and
efficient representation of lexicalized trees.Comment: ps file. English w/ German abstract. 10 page
Separating Dependency from Constituency in a Tree Rewriting System
In this paper we present a new tree-rewriting formalism called Link-Sharing
Tree Adjoining Grammar (LSTAG) which is a variant of synchronous TAGs. Using
LSTAG we define an approach towards coordination where linguistic dependency is
distinguished from the notion of constituency. Such an approach towards
coordination that explicitly distinguishes dependencies from constituency gives
a better formal understanding of its representation when compared to previous
approaches that use tree-rewriting systems which conflate the two issues.Comment: 7 pages, 6 Postscript figures, uses fullname.st
Restricting the Weak-Generative Capacity of Synchronous Tree-Adjoining Grammars
The formalism of synchronous tree-adjoining grammars, a variant of standard
tree-adjoining grammars (TAG), was intended to allow the use of TAGs for
language transduction in addition to language specification. In previous work,
the definition of the transduction relation defined by a synchronous TAG was
given by appeal to an iterative rewriting process. The rewriting definition of
derivation is problematic in that it greatly extends the expressivity of the
formalism and makes the design of parsing algorithms difficult if not
impossible. We introduce a simple, natural definition of synchronous
tree-adjoining derivation, based on isomorphisms between standard
tree-adjoining derivations, that avoids the expressivity and implementability
problems of the original rewriting definition. The decrease in expressivity,
which would otherwise make the method unusable, is offset by the incorporation
of an alternative definition of standard tree-adjoining derivation, previously
proposed for completely separate reasons, thereby making it practical to
entertain using the natural definition of synchronous derivation. Nonetheless,
some remaining problematic cases call for yet more flexibility in the
definition; the isomorphism requirement may have to be relaxed. It remains for
future research to tune the exact requirements on the allowable mappings.Comment: 21 pages, uses lingmacros.sty, psfig.sty, fullname.sty; minor
typographical changes onl
Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies
Synchronous Context-Free Grammars (SCFGs), also known as syntax-directed
translation schemata, are unlike context-free grammars in that they do not have
a binary normal form. In general, parsing with SCFGs takes space and time
polynomial in the length of the input strings, but with the degree of the
polynomial depending on the permutations of the SCFG rules. We consider linear
parsing strategies, which add one nonterminal at a time. We show that for a
given input permutation, the problems of finding the linear parsing strategy
with the minimum space and time complexity are both NP-hard
- …