2 research outputs found
What comes next? Extractive summarization by next-sentence prediction
Existing approaches to automatic summarization assume that a length limit for
the summary is given, and view content selection as an optimization problem to
maximize informativeness and minimize redundancy within this budget. This
framework ignores the fact that human-written summaries have rich internal
structure which can be exploited to train a summarization system. We present
NEXTSUM, a novel approach to summarization based on a model that predicts the
next sentence to include in the summary using not only the source article, but
also the summary produced so far. We show that such a model successfully
captures summary-specific discourse moves, and leads to better content
selection performance, in addition to automatically predicting how long the
target summary should be. We perform experiments on the New York Times
Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model
summarization baselines by significant margins. We also show that the lengths
of summaries produced by our system correlates with the lengths of the
human-written gold standards
Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization
Text summarization refers to the process that generates a shorter form of
text from the source document preserving salient information. Many existing
works for text summarization are generally evaluated by using recall-oriented
understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are
computed based on n-gram overlap, they do not reflect semantic meaning
correspondences between generated and reference summaries. Because Korean is an
agglutinative language that combines various morphemes into a word that express
several meanings, ROUGE is not suitable for Korean summarization. In this
paper, we propose evaluation metrics that reflect semantic meanings of a
reference summary and the original document, Reference and Document Aware
Semantic Score (RDASS). We then propose a method for improving the correlation
of the metrics with human judgment. Evaluation results show that the
correlation with human judgment is significantly higher for our evaluation
metrics than for ROUGE scores.Comment: COLING 202