Skip to main content
Article thumbnail
Location of Repository

Optimisation of corpus-derived probabilistic grammars

By Anja Belz


This paper examines the usefulness of corpus-derived probabilistic grammars as a basis for the automatic construction of grammars optimised for a given parsing task. Initially, a probabilistic context-free grammar (PCFG) is derived by a straightforward derivation technique from the Wall Street Journal (WSJ) Corpus, and a baseline is established by testing the resulting grammar on four different parsing tasks. In the first optimisation step, different kinds of local structural context (LSC) are incorporated into the basic PCFG. Improved parsing results demonstrate the usefulness of the added structural context information. In the second optimisation step, LSC-PCFGs are optimised in terms of grammar size and performance for a given parsing task. Tests show that significant improvements can be achieved by the method proposed. The structure of this paper is as follows. Section 2 discusses the practic

Topics: Q100 Linguistics
Year: 2001
OAI identifier:

Suggested articles


  1. (2000). A maximum-entropy-inspired parser.
  2. (2000). Computational Learning of Finite State Models for Natural Language Processing.
  3. (2000). Discriminative reranking for natural language parsing. doi
  4. (1998). Error-driven pruning of treebank grammars for base noun phrase identification. doi
  5. (1999). Foundations of Statistical Natural Language Processing.
  6. (1999). Head-driven statisticalmodels for natural languageparsing.
  7. (2000). Introduction to the CoNLL-2000 shared task: Chunking. doi
  8. (2000). LoPar: Design and implementation.
  9. (2000). Nonterms F-Score (sub) F-Score +/- Size +/-Grammar: PN; Grammar Size: 16,480/970 Task: BaseNP chunking; F-Score: 89.89 Krotov
  10. (1991). Parsing by chunks. doi
  11. (1997). Statistical parsing with a context-free grammar and word statistics.
  12. (1995). Text chunking using transformation-based learning. doi
  13. (1998). The effect of alternative tree representations on tree bank grammars. doi
  14. (1997). Three generative, lexicalised models for statistical parsing. doi
  15. (1996). Tree-bank grammars.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.