634 research outputs found
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
Improving Sequential Determinantal Point Processes for Supervised Video Summarization
It is now much easier than ever before to produce videos. While the
ubiquitous video data is a great source for information discovery and
extraction, the computational challenges are unparalleled. Automatically
summarizing the videos has become a substantial need for browsing, searching,
and indexing visual content. This paper is in the vein of supervised video
summarization using sequential determinantal point process (SeqDPP), which
models diversity by a probabilistic distribution. We improve this model in two
folds. In terms of learning, we propose a large-margin algorithm to address the
exposure bias problem in SeqDPP. In terms of modeling, we design a new
probabilistic distribution such that, when it is integrated into SeqDPP, the
resulting model accepts user input about the expected length of the summary.
Moreover, we also significantly extend a popular video summarization dataset by
1) more egocentric videos, 2) dense user annotations, and 3) a refined
evaluation scheme. We conduct extensive experiments on this dataset (about 60
hours of videos in total) and compare our approach to several competitive
baselines
Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders
Automatic chat summarization can help people quickly grasp important
information from numerous chat messages. Unlike conventional documents, chat
logs usually have fragmented and evolving topics. In addition, these logs
contain a quantity of elliptical and interrogative sentences, which make the
chat summarization highly context dependent. In this work, we propose a novel
unsupervised framework called RankAE to perform chat summarization without
employing manually labeled data. RankAE consists of a topic-oriented ranking
strategy that selects topic utterances according to centrality and diversity
simultaneously, as well as a denoising auto-encoder that is carefully designed
to generate succinct but context-informative summaries based on the selected
utterances. To evaluate the proposed method, we collect a large-scale dataset
of chat logs from a customer service environment and build an annotated set
only for model evaluation. Experimental results show that RankAE significantly
outperforms other unsupervised methods and is able to generate high-quality
summaries in terms of relevance and topic coverage.Comment: Accepted by AAAI 2021, 9 page
Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes
High-quality dialogue-summary paired data is expensive to produce and
domain-sensitive, making abstractive dialogue summarization a challenging task.
In this work, we propose the first unsupervised abstractive dialogue
summarization model for tete-a-tetes (SuTaT). Unlike standard text
summarization, a dialogue summarization method should consider the
multi-speaker scenario where the speakers have different roles, goals, and
language styles. In a tete-a-tete, such as a customer-agent conversation, SuTaT
aims to summarize for each speaker by modeling the customer utterances and the
agent utterances separately while retaining their correlations. SuTaT consists
of a conditional generative module and two unsupervised summarization modules.
The conditional generative module contains two encoders and two decoders in a
variational autoencoder framework where the dependencies between two latent
spaces are captured. With the same encoders and decoders, two unsupervised
summarization modules equipped with sentence-level self-attention mechanisms
generate summaries without using any annotations. Experimental results show
that SuTaT is superior on unsupervised dialogue summarization for both
automatic and human evaluations, and is capable of dialogue classification and
single-turn conversation generation
- …