3 research outputs found

    Query-Aware Sparse Coding for Multi-Video Summarization

    Full text link
    Given the explosive growth of online videos, it is becoming increasingly important to relieve the tedious work of browsing and managing the video content of interest. Video summarization aims at providing such a technique by transforming one or multiple videos into a compact one. However, conventional multi-video summarization methods often fail to produce satisfying results as they ignore the user's search intent. To this end, this paper proposes a novel query-aware approach by formulating the multi-video summarization in a sparse coding framework, where the web images searched by the query are taken as the important preference information to reveal the query intent. To provide a user-friendly summarization, this paper also develops an event-keyframe presentation structure to present keyframes in groups of specific events related to the query by using an unsupervised multi-graph fusion method. We release a new public dataset named MVS1K, which contains about 1, 000 videos from 10 queries and their video tags, manual annotations, and associated web images. Extensive experiments on MVS1K dataset validate our approaches produce superior objective and subjective results against several recently proposed approaches.Comment: 10 pages, 8 figure

    Video Summarization with Attention-Based Encoder-Decoder Networks

    Full text link
    This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named Attentive encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on three video summarization benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the superiority of the proposed AVS-based approaches against the state-of-the-art approaches,with remarkable improvements from 0.8% to 3% on two datasets,respectively..Comment: 9 pages, 7 figure

    Query-Conditioned Three-Player Adversarial Network for Video Summarization

    Full text link
    Video summarization plays an important role in video understanding by selecting key frames/shots. Traditionally, it aims to find the most representative and diverse contents in a video as short summaries. Recently, a more generalized task, query-conditioned video summarization, has been introduced, which takes user queries into consideration to learn more user-oriented summaries. In this paper, we propose a query-conditioned three-player generative adversarial network to tackle this challenge. The generator learns the joint representation of the user query and the video content, and the discriminator takes three pairs of query-conditioned summaries as the input to discriminate the real summary from a generated and a random one. A three-player loss is introduced for joint training of the generator and the discriminator, which forces the generator to learn better summary results, and avoids the generation of random trivial summaries. Experiments on a recently proposed query-conditioned video summarization benchmark dataset show the efficiency and efficacy of our proposed method.Comment: 13 pages, 3 figures, BMVC 201
    corecore