Skip to main content
Article thumbnail
Location of Repository

Evaluating two methods for Treebank grammar compaction

By A. Krotov, M. Hepple, R. Gaizauskas and Y. Wilks

Abstract

Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.\ud \ud In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision

Publisher: Cambridge University Press
Year: 1999
OAI identifier: oai:eprints.whiterose.ac.uk:1631

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.