1,306 research outputs found
Abstractive Multi-Document Summarization via Phrase Selection and Merging
We propose an abstraction-based multi-document summarization framework that
can construct new sentences by exploring more fine-grained syntactic units than
sentences, namely, noun/verb phrases. Different from existing abstraction-based
approaches, our method first constructs a pool of concepts and facts
represented by phrases from the input documents. Then new sentences are
generated by selecting and merging informative phrases to maximize the salience
of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging
simultaneously in order to achieve the global optimal solution for a summary.
Experimental results on the benchmark data set TAC 2011 show that our framework
outperforms the state-of-the-art models under automated pyramid evaluation
metric, and achieves reasonably well results on manual linguistic quality
evaluation.Comment: 11 pages, 1 figure, accepted as a full paper at ACL 201
Better Summarization Evaluation with Word Embeddings for ROUGE
ROUGE is a widely adopted, automatic evaluation measure for text
summarization. While it has been shown to correlate well with human judgements,
it is biased towards surface lexical similarities. This makes it unsuitable for
the evaluation of abstractive summarization, or summaries with substantial
paraphrasing. We study the effectiveness of word embeddings to overcome this
disadvantage of ROUGE. Specifically, instead of measuring lexical overlaps,
word embeddings are used to compute the semantic similarity of the words used
in summaries instead. Our experimental results show that our proposal is able
to achieve better correlations with human judgements when measured with the
Spearman and Kendall rank coefficients.Comment: Pre-print - To appear in proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP
Automation of summarization evaluation methods and their application to the summarization process
Summarization is the process of creating a more compact textual representation of a
document or a collection of documents. In view of the vast increase in electronically
available information sources in the last decade, filters such as automatically generated
summaries are becoming ever more important to facilitate the efficient acquisition
and use of required information. Different methods using natural language processing
(NLP) techniques are being used to this end. One of the shallowest approaches is the
clustering of available documents and the representation of the resulting clusters by
one of the documents; an example of this approach is the Google News website. It is
also possible to augment the clustering of documents with a summarization process,
which would result in a more balanced representation of the information in the cluster,
NewsBlaster being an example. However, while some systems are already available on
the web, summarization is still considered a difficult problem in the NLP community.
One of the major problems hampering the development of proficient summarization
systems is the evaluation of the (true) quality of system-generated summaries. This
is exemplified by the fact that the current state-of-the-art evaluation method to assess
the information content of summaries, the Pyramid evaluation scheme, is a manual
procedure.
In this light, this thesis has three main objectives.
1. The development of a fully automated evaluation method. The proposed scheme
is rooted in the ideas underlying the Pyramid evaluation scheme and makes use
of deep syntactic information and lexical semantics. Its performance improves
notably on previous automated evaluation methods.
2. The development of an automatic summarization system which draws on the
conceptual idea of the Pyramid evaluation scheme and the techniques developed
for the proposed evaluation system. The approach features the algorithm for
determining the pyramid and bases importance on the number of occurrences of
the variable-sized contributors of the pyramid as opposed to word-based methods
exploited elsewhere.
3. The development of a text coherence component that can be used for obtaining
the best ordering of the sentences in a summary
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
We study unsupervised multi-document summarization evaluation metrics, which
require neither human-written reference summaries nor human annotations (e.g.
preferences, ratings, etc.). We propose SUPERT, which rates the quality of a
summary by measuring its semantic similarity with a pseudo reference summary,
i.e. selected salient sentences from the source documents, using contextualized
embeddings and soft token alignment techniques. Compared to the
state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with
human ratings by 18-39%. Furthermore, we use SUPERT as rewards to guide a
neural-based reinforcement learning summarizer, yielding favorable performance
compared to the state-of-the-art unsupervised summarizers. All source code is
available at https://github.com/yg211/acl20-ref-free-eval.Comment: ACL 202
- …