294 research outputs found
Sentence Centrality Revisited for Unsupervised Summarization
Single document summarization has enjoyed renewed interests in recent years
thanks to the popularity of neural network models and the availability of
large-scale datasets. In this paper we develop an unsupervised approach arguing
that it is unrealistic to expect large-scale and high-quality training data to
be available or created for different types of summaries, domains, or
languages. We revisit a popular graph-based ranking algorithm and modify how
node (aka sentence) centrality is computed in two ways: (a)~we employ BERT, a
state-of-the-art neural representation learning model to better capture
sentential meaning and (b)~we build graphs with directed edges arguing that the
contribution of any two nodes to their respective centrality is influenced by
their relative position in a document. Experimental results on three news
summarization datasets representative of different languages and writing styles
show that our approach outperforms strong baselines by a wide margin.Comment: ACL 201
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
We study unsupervised multi-document summarization evaluation metrics, which
require neither human-written reference summaries nor human annotations (e.g.
preferences, ratings, etc.). We propose SUPERT, which rates the quality of a
summary by measuring its semantic similarity with a pseudo reference summary,
i.e. selected salient sentences from the source documents, using contextualized
embeddings and soft token alignment techniques. Compared to the
state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with
human ratings by 18-39%. Furthermore, we use SUPERT as rewards to guide a
neural-based reinforcement learning summarizer, yielding favorable performance
compared to the state-of-the-art unsupervised summarizers. All source code is
available at https://github.com/yg211/acl20-ref-free-eval.Comment: ACL 202
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders
Pre-trained sentence representations are crucial for identifying significant
sentences in unsupervised document extractive summarization. However, the
traditional two-step paradigm of pre-training and sentence-ranking, creates a
gap due to differing optimization objectives. To address this issue, we argue
that utilizing pre-trained embeddings derived from a process specifically
designed to optimize cohensive and distinctive sentence representations helps
rank significant sentences. To do so, we propose a novel graph pre-training
auto-encoder to obtain sentence embeddings by explicitly modelling
intra-sentential distinctive features and inter-sentential cohesive features
through sentence-word bipartite graphs. These pre-trained sentence
representations are then utilized in a graph-based ranking algorithm for
unsupervised summarization. Our method produces predominant performance for
unsupervised summarization frameworks by providing summary-worthy sentence
representations. It surpasses heavy BERT- or RoBERTa-based sentence
representations in downstream tasks.Comment: Accepted by the 2023 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2023
Unsupervised Multi-document Summarization with Holistic Inference
Multi-document summarization aims to obtain core information from a
collection of documents written on the same topic. This paper proposes a new
holistic framework for unsupervised multi-document extractive summarization.
Our method incorporates the holistic beam search inference method associated
with the holistic measurements, named Subset Representative Index (SRI). SRI
balances the importance and diversity of a subset of sentences from the source
documents and can be calculated in unsupervised and adaptive manners. To
demonstrate the effectiveness of our method, we conduct extensive experiments
on both small and large-scale multi-document summarization datasets under both
unsupervised and adaptive settings. The proposed method outperforms strong
baselines by a significant margin, as indicated by the resulting ROUGE scores
and diversity measures. Our findings also suggest that diversity is essential
for improving multi-document summary performance.Comment: Findings of IJCNLP-AACL 202
- …