3,531 research outputs found

    Constraint-Based Categorial Grammar

    Full text link
    We propose a generalization of Categorial Grammar in which lexical categories are defined by means of recursive constraints. In particular, the introduction of relational constraints allows one to capture the effects of (recursive) lexical rules in a computationally attractive manner. We illustrate the linguistic merits of the new approach by showing how it accounts for the syntax of Dutch cross-serial dependencies and the position and scope of adjuncts in such constructions. Delayed evaluation is used to process grammars containing recursive constraints.Comment: 8 pages, LaTe

    The syntactic processing of particles in Japanese spoken language

    Get PDF
    Particles fullfill several distinct central roles in the Japanese language. They can mark arguments as well as adjuncts, can be functional or have semantic funtions. There is, however, no straightforward matching from particles to functions, as, e.g., GA can mark the subject, the object or an adjunct of a sentence. Particles can cooccur. Verbal arguments that could be identified by particles can be eliminated in the Japanese sentence. And finally, in spoken language particles are often omitted. A proper treatment of particles is thus necessary to make an analysis of Japanese sentences possible. Our treatment is based on an empirical investigation of 800 dialogues. We set up a type hierarchy of particles motivated by their subcategorizational and modificational behaviour. This type hierarchy is part of the Japanese syntax in VERBMOBIL.Comment: 8 page

    Evaluation of an automatic f-structure annotation algorithm against the PARC 700 dependency bank

    Get PDF
    An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is described in (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; Oā€™Donovan et al., 2004). The annotation algorithm and the automatically-generated f-structures are the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2002; Cahill et al., 2004a) and for the induction of LFG semantic forms (Oā€™Donovan et al., 2004). The quality of the annotation algorithm and the f-structures it generates is, therefore, extremely important. To date, annotation quality has been measured in terms of precision and recall against the DCU 105. The annotation algorithm currently achieves an f-score of 96.57% for complete f-structures and 94.3% for preds-only f-structures. There are a number of problems with evaluating against a gold standard of this size, most notably that of overfitting. There is a risk of assuming that the gold standard is a complete and balanced representation of the linguistic phenomena in a language and basing design decisions on this. It is, therefore, preferable to evaluate against a more extensive, external standard. Although the DCU 105 is publicly available, 1 a larger well-established external standard can provide a more widely-recognised benchmark against which the quality of the f-structure annotation algorithm can be evaluated. For these reasons, we present an evaluation of the f-structure annotation algorithm of (Cahill et al., 2002; Cahill et al., 2004a; Cahill et al., 2004b; Oā€™Donovan et al., 2004) against the PARC 700 Dependency Bank (King et al., 2003). Evaluation against an external gold standard is a non-trivial task as linguistic analyses may differ systematically between the gold standard and the output to be evaluated as regards feature geometry and nomenclature. We present conversion software to automatically account for many (but not all) of the systematic differences. Currently, we achieve an f-score of 87.31% for the f-structures generated from the original Penn-II trees and an f-score of 81.79% for f-structures from parse trees produced by Charniakā€™s (2000) parser in our pipeline parsing architecture against the PARC 700

    Automatic F-Structure Annotation from the AP Treebank

    Get PDF
    We present a method for automatically annotating treebank resources with functional structures. The method defines systematic patterns of correspondence between partial PS configurations and functional structures. These are applied to PS rules extracted from treebanks. The set of techniques which we have developed constitute a methodology for corpus-guided grammar development. Despite the widespread belief that treebank representations are not very useful in grammar development, we show that systematic patterns of c-structure to f-structure correspondence can be simply and successfully stated over such rules. The method is partial in that it requires manual correction of the annotated grammar rules

    Morphological Productivity in the Lexicon

    Full text link
    In this paper we outline a lexical organization for Turkish that makes use of lexical rules for inflections, derivations, and lexical category changes to control the proliferation of lexical entries. Lexical rules handle changes in grammatical roles, enforce type constraints, and control the mapping of subcategorization frames in valency-changing operations. A lexical inheritance hierarchy facilitates the enforcement of type constraints. Semantic compositions in inflections and derivations are constrained by the properties of the terms and predicates. The design has been tested as part of a HPSG grammar for Turkish. In terms of performance, run-time execution of the rules seems to be a far better alternative than pre-compilation. The latter causes exponential growth in the lexicon due to intensive use of inflections and derivations in Turkish.Comment: 10 pages LaTeX, {lingmacros,avm,psfig}.sty, 1 figure, 1 bibtex fil

    Treebank-based acquisition of a Chinese lexical-functional grammar

    Get PDF
    Scaling wide-coverage, constraint-based grammars such as Lexical-Functional Grammars (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001) or Head-Driven Phrase Structure Grammars (HPSG) (Pollard and Sag, 1994) from fragments to naturally occurring unrestricted text is knowledge-intensive, time-consuming and (often prohibitively) expensive. A number of researchers have recently presented methods to automatically acquire wide-coverage, probabilistic constraint-based grammatical resources from treebanks (Cahill et al., 2002, Cahill et al., 2003; Cahill et al., 2004; Miyao et al., 2003; Miyao et al., 2004; Hockenmaier and Steedman, 2002; Hockenmaier, 2003), addressing the knowledge acquisition bottleneck in constraint-based grammar development. Research to date has concentrated on English and German. In this paper we report on an experiment to induce wide-coverage, probabilistic LFG grammatical and lexical resources for Chinese from the Penn Chinese Treebank (CTB) (Xue et al., 2002) based on an automatic f-structure annotation algorithm. Currently 96.751% of the CTB trees receive a single, covering and connected f-structure, 0.112% do not receive an f-structure due to feature clashes, while 3.137% are associated with multiple f-structure fragments. From the f-structure-annotated CTB we extract a total of 12975 lexical entries with 20 distinct subcategorisation frame types. Of these 3436 are verbal entries with a total of 11 different frame types. We extract a number of PCFG-based LFG approximations. Currently our best automatically induced grammars achieve an f-score of 81.57% against the trees in unseen articles 301-325; 86.06% f-score (all grammatical functions) and 73.98% (preds-only) against the dependencies derived from the f-structures automatically generated for the original trees in 301-325 and 82.79% (all grammatical functions) and 67.74% (preds-only) against the dependencies derived from the manually annotated gold-standard f-structures for 50 trees randomly selected from articles 301-325

    Morphology-Syntax interface for Turkish LFG

    Get PDF
    This paper investigates the use of sublexical units as a solution to handling the complex morphology with productive derivational processes, in the development of a lexical functional grammar for Turkish. Such sublexical units make it possible to expose the internal structure of words with multiple derivations to the grammar rules in a uniform manner. This in turn leads to more succinct and manageable rules. Further, the semantics of the derivations can also be systematically reflected in a compositional way by constructing PRED values on the fly. We illustrate how we use sublexical units for handling simple productive derivational morphology and more interesting cases such as causativization, etc., which change verb valency. Our priority is to handle several linguistic phenomena in order to observe the effects of our approach on both the c-structure and the f-structure representation, and grammar writing, leaving the coverage and evaluation issues aside for the moment

    Tactical Generation in a Free Constituent Order Language

    Full text link
    This paper describes tactical generation in Turkish, a free constituent order language, in which the order of the constituents may change according to the information structure of the sentences to be generated. In the absence of any information regarding the information structure of a sentence (i.e., topic, focus, background, etc.), the constituents of the sentence obey a default order, but the order is almost freely changeable, depending on the constraints of the text flow or discourse. We have used a recursively structured finite state machine for handling the changes in constituent order, implemented as a right-linear grammar backbone. Our implementation environment is the GenKit system, developed at Carnegie Mellon University--Center for Machine Translation. Morphological realization has been implemented using an external morphological analysis/generation component which performs concrete morpheme selection and handles morphographemic processes.Comment: gzipped, uuencoded postscript fil
    • ā€¦
    corecore