173 research outputs found
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Recent years have witnessed a resurgence of interest in video summarization.
However, one of the main obstacles to the research on video summarization is
the user subjectivity - users have various preferences over the summaries. The
subjectiveness causes at least two problems. First, no single video summarizer
fits all users unless it interacts with and adapts to the individual users.
Second, it is very challenging to evaluate the performance of a video
summarizer.
To tackle the first problem, we explore the recently proposed query-focused
video summarization which introduces user preferences in the form of text
queries about the video into the summarization process. We propose a memory
network parameterized sequential determinantal point process in order to attend
the user query onto different video frames and shots. To address the second
challenge, we contend that a good evaluation metric for video summarization
should focus on the semantic information that humans can perceive rather than
the visual features or temporal overlaps. To this end, we collect dense
per-video-shot concept annotations, compile a new dataset, and suggest an
efficient evaluation method defined upon the concept annotations. We conduct
extensive experiments contrasting our video summarizer to existing ones and
present detailed analyses about the dataset and the new evaluation method
Hierarchically-Attentive RNN for Album Summarization and Storytelling
We address the problem of end-to-end visual storytelling. Given a photo
album, our model first selects the most representative (summary) photos, and
then composes a natural language story for the album. For this task, we make
use of the Visual Storytelling dataset and a model composed of three
hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album
photos, select representative (summary) photos, and compose the story.
Automatic and human evaluations show our model achieves better performance on
selection, generation, and retrieval than baselines.Comment: To appear at EMNLP-2017 (7 pages
- …