94 research outputs found
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
From left-regular to Greibach normal form grammars
Each context-free grammar can be transformed to a context-free grammar in Greibach normal form, that is, a context-free grammar where each right-hand side of a prorfuction begins with a terminal symbol and the remainder of the right-hand side consists of nonterminal symbols. In this short paper we show that for a left-regular grammar G we can obtain a right-regular grammar G’ (which is by definition in Greibach normal form) which left-to-right covers G (in this case left parses of G’ can be mapped by a homomorphism on right parses of G. Moreover, it is possible to obtain a context-free grammar G” in Greibach normal form which right covers the left-regular grammar G (in this case right parses of G” are mapped on right parses of G)
General Grammars: Normal Forms with Applications
Táto práca sa zaoberá problematikou obecnĂ˝ch gramatĂk, normálnych foriem a ich aplikáciĂ. Zameriava sa na kontextovĂ© gramatiky ako ich špeciálne prĂpady. Na základe analĂ˝zy tejto mnoĹľiny bol navrhnutĂ˝ algoritmus vyuĹľĂvajĂşci princĂpy Cocke-Younger-Kasami algoritmu za účelom rozhodnutia, ÄŤi zadanĂ˝ reĹĄazec je vetou jazyka definovanĂ©ho kontextovou gramatikou. VĂ˝sledná aplikácia implementujĂşca toto riešenie je navrhnutá pre prácu s kontextovĂ˝mi gramatikami v Penttonenovej normálnej forme.This thesis deals with the topic of unrestricted grammars, normal forms, and their applications. It focuses on context-sensitive grammars as their special cases. Based on the analysis of the set, an algorithm was designed using the principles of the Cocke-Younger-Kasami algorithm to make a decision of whether an input string is a sentence of a context-sensitivegrammar. The final application, which implements this algorithm, works with context-sensitive grammars in the Penttonen normal form.
Simple chain grammars and languages
A subclass of the LR(0)-grammars, the class of simple chain grammars is introduced. Although there exist simple chain grammars which are not LL(k) for any k>0, this new class of grammars is very closely related to the LL(1) and simple LL(1) grammars. In fact it can be shown that every simple chain grammar has an equivalent simple LL(1) grammar. Cover properties for simple chain grammars are investigated and a deterministic pushdown transducer which acts as a right parser for simple chain grammars is presented
- …