Search CORE

10,635 research outputs found

Evaluation Strategies for Computational Construction Grammars

Author: Beuls Katrien
Marques Tania
Publication venue
Publication date: 01/01/2016
Field of study

Edinburgh Research Explorer

Repository of the University of Namur

Treebank-based acquisition of wide-coverage, probabilistic LFG resources: project overview, results and evaluation

Author: Burke Michael
Cahill Aoife
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents an overview of a project to acquire wide-coverage, probabilistic Lexical-Functional Grammar (LFG) resources from treebanks. Our approach is based on an automatic annotation algorithm that annotates “raw” treebank trees with LFG f-structure information approximating to basic predicate-argument/dependency structure. From the f-structure-annotated treebank we extract probabilistic unification grammar resources. We present the annotation algorithm, the extraction of lexical information and the acquisition of wide-coverage and robust PCFG-based LFG approximations including long-distance dependency resolution. We show how the methodology can be applied to multilingual, treebank-based unification grammar acquisition. Finally we show how simple (quasi-)logical forms can be derived automatically from the f-structures generated for the treebank trees

CiteSeerX

Irish Universities

DCU Online Research Access Service

Treebank-based acquisition of a Chinese lexical-functional grammar

Author: Bodomo Adams
Burke Michael
Cahill Aoife
Chan Rowena
Lam Olivia
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2004
Field of study

Scaling wide-coverage, constraint-based grammars such as Lexical-Functional Grammars (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001) or Head-Driven Phrase Structure Grammars (HPSG) (Pollard and Sag, 1994) from fragments to naturally occurring unrestricted text is knowledge-intensive, time-consuming and (often prohibitively) expensive. A number of researchers have recently presented methods to automatically acquire wide-coverage, probabilistic constraint-based grammatical resources from treebanks (Cahill et al., 2002, Cahill et al., 2003; Cahill et al., 2004; Miyao et al., 2003; Miyao et al., 2004; Hockenmaier and Steedman, 2002; Hockenmaier, 2003), addressing the knowledge acquisition bottleneck in constraint-based grammar development. Research to date has concentrated on English and German. In this paper we report on an experiment to induce wide-coverage, probabilistic LFG grammatical and lexical resources for Chinese from the Penn Chinese Treebank (CTB) (Xue et al., 2002) based on an automatic f-structure annotation algorithm. Currently 96.751% of the CTB trees receive a single, covering and connected f-structure, 0.112% do not receive an f-structure due to feature clashes, while 3.137% are associated with multiple f-structure fragments. From the f-structure-annotated CTB we extract a total of 12975 lexical entries with 20 distinct subcategorisation frame types. Of these 3436 are verbal entries with a total of 11 different frame types. We extract a number of PCFG-based LFG approximations. Currently our best automatically induced grammars achieve an f-score of 81.57% against the trees in unseen articles 301-325; 86.06% f-score (all grammatical functions) and 73.98% (preds-only) against the dependencies derived from the f-structures automatically generated for the original trees in 301-325 and 82.79% (all grammatical functions) and 67.74% (preds-only) against the dependencies derived from the manually annotated gold-standard f-structures for 50 trees randomly selected from articles 301-325

Irish Universities

DCU Online Research Access Service

Probabilistic parsing

Author: Nederhof Mark Jan
Satta Giorgio
Publication venue: Springer
Publication date: 06/01/2011
Field of study

Postprin

St Andrews Research Repository

Probabilistic Parsing Strategies

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/2002
Field of study

We present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counter-parts. Such parsing strategies are seen as constructions of push-down devices from grammars. We show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation. From our general results we also derive negative results on so-called generalized LR parsing.Comment: 36 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation

Author: Broeker Norbert
Hahn Udo
Neuhaus Peter
Publication venue
Publication date: 01/01/1997
Field of study

We argue for a performance-based design of natural language grammars and their associated parsers in order to meet the constraints imposed by real-world NLP. Our approach incorporates declarative and procedural knowledge about language and language use within an object-oriented specification framework. We discuss several message-passing protocols for parsing and provide reasons for sacrificing completeness of the parse in favor of efficiency based on a preliminary empirical evaluation.Comment: 12 pages, uses epsfig.st

arXiv.org e-Print Archive

CiteSeerX

Efficient deep processing of japanese

Author: Bender Emily M.
Siegel Melanie
Publication venue
Publication date: 01/01/2002
Field of study

We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages

arXiv.org e-Print Archive

Crossref

Publications at Bielefeld University

Hochschulschriftenserver - Universität Frankfurt am Main

Parsing as Reduction

Author: Fernández-González Daniel
Martins André F. T.
Publication venue
Publication date: 01/01/2015
Field of study

We reduce phrase-representation parsing to dependency parsing. Our reduction is grounded on a new intermediate representation, "head-ordered dependency trees", shown to be isomorphic to constituent trees. By encoding order information in the dependency labels, we show that any off-the-shelf, trainable dependency parser can be used to produce constituents. When this parser is non-projective, we can perform discontinuous parsing in a very natural manner. Despite the simplicity of our approach, experiments show that the resulting parsers are on par with strong baselines, such as the Berkeley parser for English and the best single system in the SPMRL-2014 shared task. Results are particularly striking for discontinuous parsing of German, where we surpass the current state of the art by a wide margin

arXiv.org e-Print Archive

Crossref

Parallel Distributed Grammar Engineering for Practical Applications

Author: Bender Emily M.
Callmeier Uli
Flickinger Dan
Oepen Stephan
Siegel Melanie
Publication venue
Publication date: 21/12/2011
Field of study

Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration

Hochschulschriftenserver - Universität Frankfurt am Main

Automatic acquisition of Spanish LFG resources from the Cast3LB treebank

Author: Cahill Aoife
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue: CSLI Publications
Publication date: 01/01/2005
Field of study

In this paper, we describe the automatic annotation of the Cast3LB Treebank with LFG f-structures for the subsequent extraction of Spanish probabilistic grammar and lexical resources. We adapt the approach and methodology of Cahill et al. (2004), O’Donovan et al. (2004) and elsewhere for English to Spanish and the Cast3LB treebank encoding. We report on the quality and coverage of the automatic f-structure annotation. Following the pipeline and integrated models of Cahill et al. (2004), we extract wide-coverage probabilistic LFG approximations and parse unseen Spanish text into f-structures. We also extend Bikel’s (2002) Multilingual Parse Engine to include a Spanish language module. Using the retrained Bikel parser in the pipeline model gives the best results against a manually constructed gold standard (73.20% predsonly f-score). We also extract Spanish lexical resources: 4090 semantic form types with 98 frame types. Subcategorised prepositions and particles are included in the frames

Irish Universities

DCU Online Research Access Service