134 research outputs found
Collaborative Summarization of Topic-Related Videos
Large collections of videos are grouped into clusters by a topic keyword,
such as Eiffel Tower or Surfing, with many important visual concepts repeating
across them. Such a topically close set of videos have mutual influence on each
other, which could be used to summarize one of them by exploiting information
from others in the set. We build on this intuition to develop a novel approach
to extract a summary that simultaneously captures both important
particularities arising in the given video, as well as, generalities identified
from the set of videos. The topic-related videos provide visual context to
identify the important parts of the video being summarized. We achieve this by
developing a collaborative sparse optimization method which can be efficiently
solved by a half-quadratic minimization algorithm. Our work builds upon the
idea of collaborative techniques from information retrieval and natural
language processing, which typically use the attributes of other similar
objects to predict the attribute of a given object. Experiments on two
challenging and diverse datasets well demonstrate the efficacy of our approach
over state-of-the-art methods.Comment: CVPR 201
Automatic Synchronization of Multi-User Photo Galleries
In this paper we address the issue of photo galleries synchronization, where
pictures related to the same event are collected by different users. Existing
solutions to address the problem are usually based on unrealistic assumptions,
like time consistency across photo galleries, and often heavily rely on
heuristics, limiting therefore the applicability to real-world scenarios. We
propose a solution that achieves better generalization performance for the
synchronization task compared to the available literature. The method is
characterized by three stages: at first, deep convolutional neural network
features are used to assess the visual similarity among the photos; then, pairs
of similar photos are detected across different galleries and used to construct
a graph; eventually, a probabilistic graphical model is used to estimate the
temporal offset of each pair of galleries, by traversing the minimum spanning
tree extracted from this graph. The experimental evaluation is conducted on
four publicly available datasets covering different types of events,
demonstrating the strength of our proposed method. A thorough discussion of the
obtained results is provided for a critical assessment of the quality in
synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi
Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization
In this paper, we present a novel unsupervised video summarization model that
requires no manual annotation. The proposed model termed Cycle-SUM adopts a new
cycle-consistent adversarial LSTM architecture that can effectively maximize
the information preserving and compactness of the summary video. It consists of
a frame selector and a cycle-consistent learning based evaluator. The selector
is a bi-direction LSTM network that learns video representations that embed the
long-range relationships among video frames. The evaluator defines a learnable
information preserving metric between original video and summary video and
"supervises" the selector to identify the most informative frames to form the
summary video. In particular, the evaluator is composed of two generative
adversarial networks (GANs), in which the forward GAN is learned to reconstruct
original video from summary video while the backward GAN learns to invert the
processing. The consistency between the output of such cycle learning is
adopted as the information preserving metric for video summarization. We
demonstrate the close relation between mutual information maximization and such
cycle learning procedure. Experiments on two video summarization benchmark
datasets validate the state-of-the-art performance and superiority of the
Cycle-SUM model over previous baselines.Comment: Accepted at AAAI 201
- …