6,590 research outputs found
Multi-document summarization based on atomic semantic events and their temporal relationss
Automatic multi-document summarization (MDS) is the process of extracting the most important information such as events and entities from multiple natural language
texts focused on the same topic.
We extract all types of semantic atomic information and feed them to a topic model to
experiment with their effects on a summary. We design a coherent summarization system by taking into account the sentence relative positions in the original text.
Our generic MDS system has outperformed the best recent multi-document
summarization system in DUC 2004 in terms of ROUGE-1 recall and -measure. Our query-focused summarization system
achieves a statistically similar result to the state-of-the-art
unsupervised system for DUC 2007 query-focused MDS task in ROUGE-2 recall measure. Update
Summarization is a new form of MDS where novel yet salience sentences are chosen as summary sentences based on the assumption that the user has already
read a given set of documents. In this thesis, we present an event based update summarization where the novelty is detected based on the temporal ordering of events and
the saliency is ensured
by event and entity distribution. To our knowledge, no other study has deeply investigated the effects of the novelty information acquired from the temporal ordering of events
(assuming that a sentence contains one or more events) in the domain of update MDS. Our update MDS system has outperformed
the state-of-the-art update MDS system in terms of ROUGE-2, and ROUGE-SU4 recall measures. Our MDS systems also generate quality summaries which are manually evaluated based
on popular evaluation criteria
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries
There is growing interest in systems that generate timeline summaries by filtering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task depend on standardized and reproducible evaluation methodologies for comparing systems. However, timeline summary evaluation is still in its infancy, with competing methodologies currently being explored in international evaluation forums such as TREC. One area of active exploration is how to explicitly represent the units of information that should appear in a 'good' summary. Currently, there are two main approaches, one based on identifying nuggets in an external 'ground truth', and the other based on clustering system outputs. In this paper, by building test collections that have both nugget and cluster annotations, we are able to compare these two approaches. Specifically, we address questions related to evaluation effort, differences in the final evaluation products, and correlations between scores and rankings generated by both approaches. We summarize advantages and disadvantages of nuggets and clusters to offer recommendations for future system evaluation
Noisy Submodular Maximization via Adaptive Sampling with Applications to Crowdsourced Image Collection Summarization
We address the problem of maximizing an unknown submodular function that can
only be accessed via noisy evaluations. Our work is motivated by the task of
summarizing content, e.g., image collections, by leveraging users' feedback in
form of clicks or ratings. For summarization tasks with the goal of maximizing
coverage and diversity, submodular set functions are a natural choice. When the
underlying submodular function is unknown, users' feedback can provide noisy
evaluations of the function that we seek to maximize. We provide a generic
algorithm -- \submM{} -- for maximizing an unknown submodular function under
cardinality constraints. This algorithm makes use of a novel exploration module
-- \blbox{} -- that proposes good elements based on adaptively sampling noisy
function evaluations. \blbox{} is able to accommodate different kinds of
observation models such as value queries and pairwise comparisons. We provide
PAC-style guarantees on the quality and sampling cost of the solution obtained
by \submM{}. We demonstrate the effectiveness of our approach in an
interactive, crowdsourced image collection summarization application.Comment: Extended version of AAAI'16 pape
Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization
In this paper, we present a novel unsupervised video summarization model that
requires no manual annotation. The proposed model termed Cycle-SUM adopts a new
cycle-consistent adversarial LSTM architecture that can effectively maximize
the information preserving and compactness of the summary video. It consists of
a frame selector and a cycle-consistent learning based evaluator. The selector
is a bi-direction LSTM network that learns video representations that embed the
long-range relationships among video frames. The evaluator defines a learnable
information preserving metric between original video and summary video and
"supervises" the selector to identify the most informative frames to form the
summary video. In particular, the evaluator is composed of two generative
adversarial networks (GANs), in which the forward GAN is learned to reconstruct
original video from summary video while the backward GAN learns to invert the
processing. The consistency between the output of such cycle learning is
adopted as the information preserving metric for video summarization. We
demonstrate the close relation between mutual information maximization and such
cycle learning procedure. Experiments on two video summarization benchmark
datasets validate the state-of-the-art performance and superiority of the
Cycle-SUM model over previous baselines.Comment: Accepted at AAAI 201
- …