3 research outputs found
Query-Aware Sparse Coding for Multi-Video Summarization
Given the explosive growth of online videos, it is becoming increasingly
important to relieve the tedious work of browsing and managing the video
content of interest. Video summarization aims at providing such a technique by
transforming one or multiple videos into a compact one. However, conventional
multi-video summarization methods often fail to produce satisfying results as
they ignore the user's search intent. To this end, this paper proposes a novel
query-aware approach by formulating the multi-video summarization in a sparse
coding framework, where the web images searched by the query are taken as the
important preference information to reveal the query intent. To provide a
user-friendly summarization, this paper also develops an event-keyframe
presentation structure to present keyframes in groups of specific events
related to the query by using an unsupervised multi-graph fusion method. We
release a new public dataset named MVS1K, which contains about 1, 000 videos
from 10 queries and their video tags, manual annotations, and associated web
images. Extensive experiments on MVS1K dataset validate our approaches produce
superior objective and subjective results against several recently proposed
approaches.Comment: 10 pages, 8 figure
Video Summarization with Attention-Based Encoder-Decoder Networks
This paper addresses the problem of supervised video summarization by
formulating it as a sequence-to-sequence learning problem, where the input is a
sequence of original video frames, the output is a keyshot sequence. Our key
idea is to learn a deep summarization network with attention mechanism to mimic
the way of selecting the keyshots of human. To this end, we propose a novel
video summarization framework named Attentive encoder-decoder networks for
Video Summarization (AVS), in which the encoder uses a Bidirectional Long
Short-Term Memory (BiLSTM) to encode the contextual information among the input
video frames. As for the decoder, two attention-based LSTM networks are
explored by using additive and multiplicative objective functions,
respectively. Extensive experiments are conducted on three video summarization
benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the
superiority of the proposed AVS-based approaches against the state-of-the-art
approaches,with remarkable improvements from 0.8% to 3% on two
datasets,respectively..Comment: 9 pages, 7 figure
Query-Conditioned Three-Player Adversarial Network for Video Summarization
Video summarization plays an important role in video understanding by
selecting key frames/shots. Traditionally, it aims to find the most
representative and diverse contents in a video as short summaries. Recently, a
more generalized task, query-conditioned video summarization, has been
introduced, which takes user queries into consideration to learn more
user-oriented summaries. In this paper, we propose a query-conditioned
three-player generative adversarial network to tackle this challenge. The
generator learns the joint representation of the user query and the video
content, and the discriminator takes three pairs of query-conditioned summaries
as the input to discriminate the real summary from a generated and a random
one. A three-player loss is introduced for joint training of the generator and
the discriminator, which forces the generator to learn better summary results,
and avoids the generation of random trivial summaries. Experiments on a
recently proposed query-conditioned video summarization benchmark dataset show
the efficiency and efficacy of our proposed method.Comment: 13 pages, 3 figures, BMVC 201