Compact keyframe-based video summaries are a popular way of generating
viewership on video sharing platforms. Yet, creating relevant and compelling
summaries for arbitrarily long videos with a small number of keyframes is a
challenging task. We propose a comprehensive keyframe-based summarization
framework combining deep convolutional neural networks and restricted Boltzmann
machines. An original co-regularization scheme is used to discover meaningful
subject-scene associations. The resulting multimodal representations are then
used to select highly-relevant keyframes. A comprehensive user study is
conducted comparing our proposed method to a variety of schemes, including the
summarization currently in use by one of the most popular video sharing
websites. The results show that our method consistently outperforms the
baseline schemes for any given amount of keyframes both in terms of
attractiveness and informativeness. The lead is even more significant for
smaller summaries.Comment: Video summarization, deep convolutional neural networks,
co-regularized restricted Boltzmann machine