4 research outputs found
AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization
This paper presents a new method for unsupervised video summarization. The proposed architecture embeds an Actor-Critic model into a Generative Adversarial Network and formulates the selection of important video fragments (that will be used to form the summary) as a sequence generation task. The Actor and the Critic take part in a game that incrementally leads to the selection of the video key-fragments, and their choices at each step of the game result in a set of rewards from the Discriminator. The designed training workflow allows the Actor and Critic to discover a space of actions and automatically learn a policy for key-fragment selection. Moreover, the introduced criterion for choosing the best model after the training ends, enables the automatic selection of proper values for parameters of the training process that are not learned from the data (such as the regularization factor σ). Experimental evaluation on two benchmark datasets (SumMe and TVSum) demonstrates that the proposed AC-SUM-GAN model performs consistently well and gives SoA results in comparison to unsupervised methods, that are also competitive with respect to supervised methods
Video Summarization Using Deep Neural Networks: A Survey
Video summarization technologies aim to create a concise and complete
synopsis by selecting the most informative parts of the video content. Several
approaches have been developed over the last couple of decades and the current
state of the art is represented by methods that rely on modern deep neural
network architectures. This work focuses on the recent advances in the area and
provides a comprehensive survey of the existing deep-learning-based methods for
generic video summarization. After presenting the motivation behind the
development of technologies for video summarization, we formulate the video
summarization task and discuss the main characteristics of a typical
deep-learning-based analysis pipeline. Then, we suggest a taxonomy of the
existing algorithms and provide a systematic review of the relevant literature
that shows the evolution of the deep-learning-based video summarization
technologies and leads to suggestions for future developments. We then report
on protocols for the objective evaluation of video summarization algorithms and
we compare the performance of several deep-learning-based approaches. Based on
the outcomes of these comparisons, as well as some documented considerations
about the suitability of evaluation protocols, we indicate potential future
research directions.Comment: Journal paper; Under revie
A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization
In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization. Our starting point is the SUM-GAN model, which creates a representative summary based on the intuition that such a summary should make it possible to reconstruct a video that is indistinguishable from the original one. We build on a publicly available implementation of a variation of this model, that includes a linear compression layer to reduce the number of learned parameters and applies an incremental approach for training the different components of the architecture. After assessing the impact of these changes to the model’s performance, we propose a stepwise, label-based learning process to improve the training efficiency of the adversarial part of the model. Before evaluating our model’s efficiency, we perform a thorough study with respect to the used evaluation protocols and we examine the possible performance on two benchmarking datasets, namely SumMe and TVSum. Experimental evaluations and comparisons with the state of the art highlight the competitiveness of the proposed method. An ablation study indicates the benefit of each applied change on the model’s performance, and points out the advantageous role of the introduced stepwise, label-based training strategy on the learning efficiency of the adversarial part of the architecture