10,330 research outputs found
A novel user-centered design for personalized video summarization
In the past, several automatic video summarization systems had been proposed to generate video summary. However, a generic video summary that is generated based only on audio, visual and textual saliencies will not satisfy every user. This paper proposes a novel system for generating semantically meaningful personalized video summaries, which are tailored to the individual user's preferences over video semantics. Each video shot is represented using a semantic multinomial which is a vector of posterior semantic concept probabilities. The proposed system stitches video summary based on summary time span and top-ranked shots that are semantically relevant to the user's preferences. The proposed summarization system is evaluated using both quantitative and subjective evaluation metrics. The experimental results on the performance of the proposed video summarization system are encouraging
Coherent segmentation of video into syntactic regions
In this paper we report on our work in realising an approach to video shot matching which involves automatically segmenting video into abstract intertwinded shapes in such a way that there is temporal coherency. These shapes representing approximations of objects and background regions can then be matched giving fine-grained shot-shot matching. The main contributions of the paper are firstly the extension of our segmentation algorithm for still images to spatial segmentation in video, and secondly the introduction a measurement of temporal coherency of the spatial segmentation. This latter allows us to quantitatively demonstrate the effectiveness of our approach on real video data
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval
With the explosive growth of web videos in recent years, large-scale
Content-Based Video Retrieval (CBVR) becomes increasingly essential in video
filtering, recommendation, and copyright protection. Segment-level CBVR
(S-CBVR) locates the start and end time of similar segments in finer
granularity, which is beneficial for user browsing efficiency and infringement
detection especially in long video scenarios. The challenge of S-CBVR task is
how to achieve high temporal alignment accuracy with efficient computation and
low storage consumption. In this paper, we propose a Segment Similarity and
Alignment Network (SSAN) in dealing with the challenge which is firstly trained
end-to-end in S-CBVR. SSAN is based on two newly proposed modules in video
retrieval: (1) An efficient Self-supervised Keyframe Extraction (SKE) module to
reduce redundant frame features, (2) A robust Similarity Pattern Detection
(SPD) module for temporal alignment. In comparison with uniform frame
extraction, SKE not only saves feature storage and search time, but also
introduces comparable accuracy and limited extra computation time. In terms of
temporal alignment, SPD localizes similar segments with higher accuracy and
efficiency than existing deep learning methods. Furthermore, we jointly train
SSAN with SKE and SPD and achieve an end-to-end improvement. Meanwhile, the two
key modules SKE and SPD can also be effectively inserted into other video
retrieval pipelines and gain considerable performance improvements.
Experimental results on public datasets show that SSAN can obtain higher
alignment accuracy while saving storage and online query computational cost
compared to existing methods.Comment: Accepted by ACM MM 202
- âŠ