Search CORE

3 research outputs found

Query-Aware Sparse Coding for Multi-Video Summarization

Author: Ji Zhong
Li Xuelong
Ma Yaru
Pang Yanwei
Publication venue
Publication date: 13/07/2017
Field of study

Given the explosive growth of online videos, it is becoming increasingly important to relieve the tedious work of browsing and managing the video content of interest. Video summarization aims at providing such a technique by transforming one or multiple videos into a compact one. However, conventional multi-video summarization methods often fail to produce satisfying results as they ignore the user's search intent. To this end, this paper proposes a novel query-aware approach by formulating the multi-video summarization in a sparse coding framework, where the web images searched by the query are taken as the important preference information to reveal the query intent. To provide a user-friendly summarization, this paper also develops an event-keyframe presentation structure to present keyframes in groups of specific events related to the query by using an unsupervised multi-graph fusion method. We release a new public dataset named MVS1K, which contains about 1, 000 videos from 10 queries and their video tags, manual annotations, and associated web images. Extensive experiments on MVS1K dataset validate our approaches produce superior objective and subjective results against several recently proposed approaches.Comment: 10 pages, 8 figure

arXiv.org e-Print Archive

Video Summarization with Attention-Based Encoder-Decoder Networks

Author: Ji Zhong
Li Xuelong
Pang Yanwei
Xiong Kailin
Publication venue
Publication date: 15/04/2018
Field of study

This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named Attentive encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on three video summarization benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the superiority of the proposed AVS-based approaches against the state-of-the-art approaches,with remarkable improvements from 0.8% to 3% on two datasets,respectively..Comment: 9 pages, 7 figure

arXiv.org e-Print Archive

Query-Conditioned Three-Player Adversarial Network for Video Summarization

Author: Kampffmeyer Michael
Liang Xiaodan
Tan Min
Xing Eric P.
Zhang Yujia
Publication venue
Publication date: 17/07/2018
Field of study

Video summarization plays an important role in video understanding by selecting key frames/shots. Traditionally, it aims to find the most representative and diverse contents in a video as short summaries. Recently, a more generalized task, query-conditioned video summarization, has been introduced, which takes user queries into consideration to learn more user-oriented summaries. In this paper, we propose a query-conditioned three-player generative adversarial network to tackle this challenge. The generator learns the joint representation of the user query and the video content, and the discriminator takes three pairs of query-conditioned summaries as the input to discriminate the real summary from a generated and a random one. A three-player loss is introduced for joint training of the generator and the discriminator, which forces the generator to learn better summary results, and avoids the generation of random trivial summaries. Experiments on a recently proposed query-conditioned video summarization benchmark dataset show the efficiency and efficacy of our proposed method.Comment: 13 pages, 3 figures, BMVC 201

arXiv.org e-Print Archive