Location of Repository

Statistical generation: three methods compared and evaluated

By Anja Belz


Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations

Topics: Q100 Linguistics
Year: 2005
OAI identifier: oai:eprints.brighton.ac.uk:3203

Suggested articles



  1. (2000). A probabilistic genre-independent model of pronominalization.
  2. (2001). An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email.
  3. (2001). BLEU: A method for automatic evaluation of machine translation. doi
  4. (2004). Classification-based generation using TAG. doi
  5. (2000). Corpus-based lexical choice in natural language generation. doi
  6. (2000). Evaluation metrics for generation. doi
  7. (2003). Exploiting a parallel text-data corpus. doi
  8. (2000). Exploiting a probabilistic hierarchical model for generation. doi
  9. (1995). Filling knowledge gaps in a broadcoverage MT system.
  10. (2000). Forest-based statistical sentence generation.
  11. (1999). Foundations of Statistical Natural Language Processing.
  12. (1993). Generalised probabilistic LR parsing of natural language (corpora) with unification-based grammars.
  13. (1998). Generation that exploits corpus-based statistical knowledge. doi
  14. (2001). Instance-based natural language generation. doi
  15. (1994). Integrating knowledge bases and statistics in MT.
  16. (1999). Ordering among premodifiers. doi
  17. (2004). Paraphrasing treebanks for stochastic realization ranking.
  18. (1991). Pearl: A probabilistic chart parser. doi
  19. (1992). Probabilistic normalisation and unpacking of packed parse forests for unification-based grammars.
  20. (2004). Reining in CCG chart realization. In doi
  21. (2001). Reusing a statistical language model for generation. doi
  22. (2002). SRILM: An extensible language modeling toolkit.
  23. (1999). Statistical NP generation: A first report.
  24. (2000). Stochastic language generation for spoken dialogue systems. doi
  25. (2002). SUMTIME-METEO: Parallel corpus of naturally occurring forecast texts and weather data.
  26. (2000). The order of prenominal adjectives in natural language generation. doi
  27. (2004). The use of a structural n-gram language model in generation-heavy hybrid machine translation. doi
  28. (2000). Trainable methods for surface natural language generation. doi
  29. (1999). Ultra-summarization: A statistical approach to generating highly condensed non-extractive summaries. doi
  30. (2004). Underspecification for NLG.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.