Search CORE

323 research outputs found

Wide-coverage deep statistical parsing using automatic dependency structure annotation

Author: Abney Stephen
Andy Way
Aoife Cahill
Briscoe Edward
Chinchor Nancy
Johnson Mark
Josef van Genabith
Michael Burke
Ruth O'Donovan
Stefan Riezler
Xue Nianwen
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2008
Field of study

A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

Crossref

Irish Universities

DCU Online Research Access Service

Linking flat predicate argument structures

Author: Xu Feiyu
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/2004
Field of study

This report presents an approach to enriching flat and robust predicate argument structures with more fine-grained semantic information, extracted from underspecified semantic representations and encoded in Minimal Recursion Semantics (MRS). Such representations are provided by a hand-built HPSG grammar with a wide linguistic coverage. A specific semantic representation, called linked predicate argument structure (LPAS), has been worked out, which describes the explicit embedding relationships among predicate argument structures. LPAS can be used as a generic interface language for integrating semantic representations with different granularities. Some initial experiments have been conducted to convert MRS expressions into LPASs. A simple constraint solver is developed to resolve the underspecified dominance relations between the predicates and their arguments in MRS expressions. LPASs are useful for high-precision information extraction and question answering tasks because of their fine-grained semantic structures. In addition, I have attempted to extend the lexicon of the HPSG English Resource Grammar (ERG) exploiting WordNet and to disambiguate the readings of HPSG parsing with the help of a probabilistic parser, in order to process texts from application domains. Following the presented approach, the HPSG ERG grammar can be used for annotating some standard treebank, e.g., the Penn Treebank, with its fine-grained semantics. In this vein, I point out opportunities for a fruitful cooperation of the HPSG annotated Redwood Treebank and the Penn PropBank. In my current work, I exploit HPSG as an additional knowledge resource for the automatic learning of LPASs from dependency structures

Universaar

Acronym

Exploiting multi-word units in history-based probabilistic generation

Author: Cafferkey Conor
Cahill Aoife
Hogan Deirdre
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

We present a simple history-based model for sentence generation from LFG f-structures, which improves on the accuracy of previous models by breaking down PCFG independence assumptions so that more f-structure conditioning context is used in the prediction of grammar rule expansions. In addition, we present work on experiments with named entities and other multi-word units, showing a statistically significant improvement of generation accuracy. Tested on section 23 of the PennWall Street Journal Treebank, the techniques described in this paper improve BLEU scores from 66.52 to 68.82, and coverage from 98.18% to 99.96%

DCU Online Research Access Service

C-structures and f-structures for the British national corpus

Author: Foster Jennifer
Seddah Djamé
van Genabith Josef
Wagner Joachim
Publication venue: CSLI Publications
Publication date: 01/01/2007
Field of study

We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%

DCU Online Research Access Service