Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms

Abstract

Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC), workshops. Since DUC01, several methods have been presented in approximately 268 publications of the stateof-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best multi-document summarization methods. In this paper, we describe a Genetic Algorithm-based method for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a Meta-document representation. Moreover, we have calculated three heuristics mentioned in several works of state-of-the-art to rank the most recent MDS methods, through the calculus of upper bounds and lower bounds

    Similar works