Skip to main content
Article thumbnail
Location of Repository

GENEVAL: a proposal for shared-task evaluation in NLG

By Ehud Reiter and Anja Belz


We propose to organise a series of shared-task NL G events, where participants are asked to build systems with similar input/output functionalities, and these systems are evaluated with a range of different evaluation techniques. The main purpose of these events is to allow us to compare different evaluation techniques, by correlating the results of different evaluations on the systems entered in the event

Topics: Q100 Linguistics
Publisher: Association for Computational Linguistics
Year: 2006
DOI identifier: 10.1145/1710000
OAI identifier:

Suggested articles


  1. (2002). Bleu: A method for automatic evaluation of machine translation. doi
  2. (2000). Building Natural Language Generation Systems. doi
  3. (2006). Comparing automatic and human evaluation of NLG systems. doi
  4. (2000). Evaluation metrics for generation. doi
  5. (2003). Exploiting a parallel text-data corpus. doi
  6. (2005). Generating textual summaries of graphical time series data to support medical decision making in the neonatal intensive care unit. doi
  7. (2003). Lessons from a failure: Generating tailored smoking cessation letters. doi
  8. (2006). Shared-task evaluations in HLT: Lessons for NLG. doi
  9. (2002). Should corpora texts be gold standards for NLG?
  10. (1998). The evolution of evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language, doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.