We address the problem of highlight detection from a 360 degree video by
summarizing it both spatially and temporally. Given a long 360 degree video, we
spatially select pleasantly-looking normal field-of-view (NFOV) segments from
unlimited field of views (FOV) of the 360 degree video, and temporally
summarize it into a concise and informative highlight as a selected subset of
subshots. We propose a novel deep ranking model named as Composition View Score
(CVS) model, which produces a spherical score map of composition per video
segment, and determines which view is suitable for highlight via a sliding
window kernel at inference. To evaluate the proposed framework, we perform
experiments on the Pano2Vid benchmark dataset and our newly collected 360
degree video highlight dataset from YouTube and Vimeo. Through evaluation using
both quantitative summarization metrics and user studies via Amazon Mechanical
Turk, we demonstrate that our approach outperforms several state-of-the-art
highlight detection methods. We also show that our model is 16 times faster at
inference than AutoCam, which is one of the first summarization algorithms of
360 degree videosComment: In AAAI 2018, 9 page