310 research outputs found
An Efficient Implementation of the Head-Corner Parser
This paper describes an efficient and robust implementation of a
bi-directional, head-driven parser for constraint-based grammars. This parser
is developed for the OVIS system: a Dutch spoken dialogue system in which
information about public transport can be obtained by telephone.
After a review of the motivation for head-driven parsing strategies, and
head-corner parsing in particular, a non-deterministic version of the
head-corner parser is presented. A memoization technique is applied to obtain a
fast parser. A goal-weakening technique is introduced which greatly improves
average case efficiency, both in terms of speed and space requirements.
I argue in favor of such a memoization strategy with goal-weakening in
comparison with ordinary chart-parsers because such a strategy can be applied
selectively and therefore enormously reduces the space requirements of the
parser, while no practical loss in time-efficiency is observed. On the
contrary, experiments are described in which head-corner and left-corner
parsers implemented with selective memoization and goal weakening outperform
`standard' chart parsers. The experiments include the grammar of the OVIS
system and the Alvey NL Tools grammar.
Head-corner parsing is a mix of bottom-up and top-down processing. Certain
approaches towards robust parsing require purely bottom-up processing.
Therefore, it seems that head-corner parsing is unsuitable for such robust
parsing techniques. However, it is shown how underspecification (which arises
very naturally in a logic programming environment) can be used in the
head-corner parser to allow such robust parsing techniques. A particular robust
parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st
Data-Oriented Language Processing. An Overview
During the last few years, a new approach to language processing has started
to emerge, which has become known under various labels such as "data-oriented
parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den
Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak
1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine &
Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This
approach, which we will call "data-oriented processing" or "DOP", embodies the
assumption that human language perception and production works with
representations of concrete past language experiences, rather than with
abstract linguistic rules. The models that instantiate this approach therefore
maintain large corpora of linguistic representations of previously occurring
utterances. When processing a new input utterance, analyses of this utterance
are constructed by combining fragments from the corpus; the
occurrence-frequencies of the fragments are used to estimate which analysis is
the most probable one.
In this paper we give an in-depth discussion of a data-oriented processing
model which employs a corpus of labelled phrase-structure trees. Then we review
some other models that instantiate the DOP approach. Many of these models also
employ labelled phrase-structure trees, but use different criteria for
extracting fragments from the corpus or employ different disambiguation
strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine &
Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their
corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema
1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip
Database of Sentences of Croatian Language
The aim of this paper is to investigate morphological and syntactical levels of sentences
of Croatian Language. Morphological grammatical level is represented by 16 categories
and 47 features with theirs value. Syntactical level is represented with constituent
structure trees of sentences. Database has 1001 grammatical sentences of Croatian
literature. Database of sentences is the result of theoretical research in computational
linguistics. This study extends the analysis of the linguistic data in anthropology in
Croatian. Access to database is through the Internet on the address: http://infoz.ffzg.
hr/tepes
A derivational model of discontinuous parsing
The notion of latent-variable probabilistic context-free derivation of syntactic structures is enhanced to allow heads and unrestricted discontinuities. The chosen formalization covers both constituency parsing and dependency parsing. By the new framework, one obtains a probability distribution over the space of all discontinuous parses. This lends itself to intrinsic evaluation in terms of cross-entropy. The derivational model is accompanied by an equivalent automaton model, which can be used for deterministic parsing.PostprintPeer reviewe
- …