5,897 research outputs found
Better Summarization Evaluation with Word Embeddings for ROUGE
ROUGE is a widely adopted, automatic evaluation measure for text
summarization. While it has been shown to correlate well with human judgements,
it is biased towards surface lexical similarities. This makes it unsuitable for
the evaluation of abstractive summarization, or summaries with substantial
paraphrasing. We study the effectiveness of word embeddings to overcome this
disadvantage of ROUGE. Specifically, instead of measuring lexical overlaps,
word embeddings are used to compute the semantic similarity of the words used
in summaries instead. Our experimental results show that our proposal is able
to achieve better correlations with human judgements when measured with the
Spearman and Kendall rank coefficients.Comment: Pre-print - To appear in proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP
- …