11,107 research outputs found
Analytic aspects of the shuffle product
There exist very lucid explanations of the combinatorial origins of rational
and algebraic functions, in particular with respect to regular and context free
languages. In the search to understand how to extend these natural
correspondences, we find that the shuffle product models many key aspects of
D-finite generating functions, a class which contains algebraic. We consider
several different takes on the shuffle product, shuffle closure, and shuffle
grammars, and give explicit generating function consequences. In the process,
we define a grammar class that models D-finite generating functions
Introducing the Concept of Activation and Blocking of Rules in the General Framework for Regulated Rewriting in Sequential Grammars
We introduce new possibilities to control the application of rules based on
the preceding application of rules which can be de ned for a general model of sequential
grammars and we show some similarities to other control mechanisms as graph-controlled
grammars and matrix grammars with and without applicability checking as well as gram-
mars with random context conditions and ordered grammars. Using both activation and
blocking of rules, in the string and in the multiset case we can show computational com-
pleteness of context-free grammars equipped with the control mechanism of activation
and blocking of rules even when using only two nonterminal symbols
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
- …