820 research outputs found
Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization
In this paper, we present a novel unsupervised video summarization model that
requires no manual annotation. The proposed model termed Cycle-SUM adopts a new
cycle-consistent adversarial LSTM architecture that can effectively maximize
the information preserving and compactness of the summary video. It consists of
a frame selector and a cycle-consistent learning based evaluator. The selector
is a bi-direction LSTM network that learns video representations that embed the
long-range relationships among video frames. The evaluator defines a learnable
information preserving metric between original video and summary video and
"supervises" the selector to identify the most informative frames to form the
summary video. In particular, the evaluator is composed of two generative
adversarial networks (GANs), in which the forward GAN is learned to reconstruct
original video from summary video while the backward GAN learns to invert the
processing. The consistency between the output of such cycle learning is
adopted as the information preserving metric for video summarization. We
demonstrate the close relation between mutual information maximization and such
cycle learning procedure. Experiments on two video summarization benchmark
datasets validate the state-of-the-art performance and superiority of the
Cycle-SUM model over previous baselines.Comment: Accepted at AAAI 201
Deep attentive video summarization with distribution consistency learning
This article studies supervised video summarization by formulating it into a sequence-to-sequence learning framework, in which the input and output are sequences of original video frames and their predicted importance scores, respectively. Two critical issues are addressed in this article: short-term contextual attention insufficiency and distribution inconsistency. The former lies in the insufficiency of capturing the short-term contextual attention information within the video sequence itself since the existing approaches focus a lot on the long-term encoder-decoder attention. The latter refers to the distributions of predicted importance score sequence and the ground-truth sequence is inconsistent, which may lead to a suboptimal solution. To better mitigate the first issue, we incorporate a self-attention mechanism in the encoder to highlight the important keyframes in a short-term context. The proposed approach alongside the encoder-decoder attention constitutes our deep attentive models for video summarization. For the second one, we propose a distribution consistency learning method by employing a simple yet effective regularization loss term, which seeks a consistent distribution for the two sequences. Our final approach is dubbed as Attentive and Distribution consistent video Summarization (ADSum). Extensive experiments on benchmark data sets demonstrate the superiority of the proposed ADSum approach against state-of-the-art approaches
Collaborative Summarization of Topic-Related Videos
Large collections of videos are grouped into clusters by a topic keyword,
such as Eiffel Tower or Surfing, with many important visual concepts repeating
across them. Such a topically close set of videos have mutual influence on each
other, which could be used to summarize one of them by exploiting information
from others in the set. We build on this intuition to develop a novel approach
to extract a summary that simultaneously captures both important
particularities arising in the given video, as well as, generalities identified
from the set of videos. The topic-related videos provide visual context to
identify the important parts of the video being summarized. We achieve this by
developing a collaborative sparse optimization method which can be efficiently
solved by a half-quadratic minimization algorithm. Our work builds upon the
idea of collaborative techniques from information retrieval and natural
language processing, which typically use the attributes of other similar
objects to predict the attribute of a given object. Experiments on two
challenging and diverse datasets well demonstrate the efficacy of our approach
over state-of-the-art methods.Comment: CVPR 201
Query-controllable Video Summarization
When video collections become huge, how to explore both within and across
videos efficiently is challenging. Video summarization is one of the ways to
tackle this issue. Traditional summarization approaches limit the effectiveness
of video exploration because they only generate one fixed video summary for a
given input video independent of the information need of the user. In this
work, we introduce a method which takes a text-based query as input and
generates a video summary corresponding to it. We do so by modeling video
summarization as a supervised learning problem and propose an end-to-end deep
learning based method for query-controllable video summarization to generate a
query-dependent video summary. Our proposed method consists of a video summary
controller, video summary generator, and video summary output module. To foster
the research of query-controllable video summarization and conduct our
experiments, we introduce a dataset that contains frame-based relevance score
labels. Based on our experimental result, it shows that the text-based query
helps control the video summary. It also shows the text-based query improves
our model performance. Our code and dataset:
https://github.com/Jhhuangkay/Query-controllable-Video-Summarization.Comment: This paper is accepted by ACM International Conference on Multimedia
Retrieval (ICMR), 202
- …