2,982 research outputs found
Enumeration of Extractive Oracle Summaries
To analyze the limitations and the future directions of the extractive
summarization paradigm, this paper proposes an Integer Linear Programming (ILP)
formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also
propose an algorithm that enumerates all of the oracle summaries for a set of
reference summaries to exploit F-measures that evaluate which system summaries
contain how many sentences that are extracted as an oracle summary. Our
experimental results obtained from Document Understanding Conference (DUC)
corpora demonstrated the following: (1) room still exists to improve the
performance of extractive summarization; (2) the F-measures derived from the
enumerated oracle summaries have significantly stronger correlations with human
judgment than those derived from single oracle summaries.Comment: 12 page
A reinforcement learning formulation to the complex question answering problem
International audienceWe use extractive multi-document summarization techniques to perform complex question answering and formulate it as a reinforcement learning problem. Given a set of complex questions, a list of relevant documents per question, and the corresponding human generated summaries (i.e. answers to the questions) as training data, the reinforcement learning module iteratively learns a number of feature weights in order to facilitate the automatic generation of summaries i.e. answers to previously unseen complex questions. A reward function is used to measure the similarities between the candidate (machine generated) summary sentences and the abstract summaries. In the training stage, the learner iteratively selects the important document sentences to be included in the candidate summary, analyzes the reward function and updates the related feature weights accordingly. The final weights are used to generate summaries as answers to unseen complex questions in the testing stage. Evaluation results show the effectiveness of our system. We also incorporate user interaction into the reinforcement learner to guide the candidate summary sentence selection process. Experiments reveal the positive impact of the user interaction component on the reinforcement learning framework
Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms
Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC), workshops. Since DUC01, several methods have been presented in approximately 268 publications of the stateof-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have
been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best
multi-document summarization methods. In this paper,
we describe a Genetic Algorithm-based method for
calculating the best sentence combinations of DUC01
and DUC02 datasets in MDS through a Meta-document
representation. Moreover, we have calculated three
heuristics mentioned in several works of state-of-the-art
to rank the most recent MDS methods, through the
calculus of upper bounds and lower bounds
- …