1 research outputs found
Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization
The centroid-based model for extractive document summarization is a simple
and fast baseline that ranks sentences based on their similarity to a centroid
vector. In this paper, we apply this ranking to possible summaries instead of
sentences and use a simple greedy algorithm to find the best summary.
Furthermore, we show possi- bilities to scale up to larger input docu- ment
collections by selecting a small num- ber of sentences from each document prior
to constructing the summary. Experiments were done on the DUC2004 dataset for
multi-document summarization. We ob- serve a higher performance over the orig-
inal model, on par with more complex state-of-the-art methods.Comment: EMNLP 2017 Workshop on New Frontiers in Summarizatio