8,668 research outputs found
A novel user-centered design for personalized video summarization
In the past, several automatic video summarization systems had been proposed to generate video summary. However, a generic video summary that is generated based only on audio, visual and textual saliencies will not satisfy every user. This paper proposes a novel system for generating semantically meaningful personalized video summaries, which are tailored to the individual user's preferences over video semantics. Each video shot is represented using a semantic multinomial which is a vector of posterior semantic concept probabilities. The proposed system stitches video summary based on summary time span and top-ranked shots that are semantically relevant to the user's preferences. The proposed summarization system is evaluated using both quantitative and subjective evaluation metrics. The experimental results on the performance of the proposed video summarization system are encouraging
Recommended from our members
Divide-and-conquer based summarization framework for extracting affective video content
YesRecent advances in multimedia technology have led to tremendous increases in the available volume of video data, thereby creating a major requirement for efficient systems to manage such huge data volumes. Video summarization is one of the key techniques for accessing and managing large video libraries. Video summarization can be used to extract the affective contents of a video sequence to generate a concise representation of its content. Human attention models are an efficient means of affective content extraction. Existing visual attention driven summarization frameworks have high computational cost and memory requirements, as well as a lack of efficiency in accurately perceiving human attention. To cope with these issues, we propose a divide-and-conquer based framework for an efficient summarization of big video data. We divide the original video data into shots, where an attention model is computed from each shot in parallel. Viewer's attention is based on multiple sensory perceptions, i.e., aural and visual, as well as the viewer's neuronal signals. The aural attention model is based on the Teager energy, instant amplitude, and instant frequency, whereas the visual attention model employs multi-scale contrast and motion intensity. Moreover, the neuronal attention is computed using the beta-band frequencies of neuronal signals. Next, an aggregated attention curve is generated using an intra- and inter-modality fusion mechanism. Finally, the affective content in each video shot is extracted. The fusion of multimedia and neuronal signals provides a bridge that links the digital representation of multimedia with the viewer’s perceptions. Our experimental results indicate that the proposed shot-detection based divide-and-conquer strategy mitigates the time and computational complexity. Moreover, the proposed attention model provides an accurate reflection of the user preferences and facilitates the extraction of highly affective and personalized summaries.Supported by the ICT R&D program of MSIP/IITP. [2014(R0112-14-1014), The Development of Open Platform for Service of Convergence Contents]
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Recent years have witnessed a resurgence of interest in video summarization.
However, one of the main obstacles to the research on video summarization is
the user subjectivity - users have various preferences over the summaries. The
subjectiveness causes at least two problems. First, no single video summarizer
fits all users unless it interacts with and adapts to the individual users.
Second, it is very challenging to evaluate the performance of a video
summarizer.
To tackle the first problem, we explore the recently proposed query-focused
video summarization which introduces user preferences in the form of text
queries about the video into the summarization process. We propose a memory
network parameterized sequential determinantal point process in order to attend
the user query onto different video frames and shots. To address the second
challenge, we contend that a good evaluation metric for video summarization
should focus on the semantic information that humans can perceive rather than
the visual features or temporal overlaps. To this end, we collect dense
per-video-shot concept annotations, compile a new dataset, and suggest an
efficient evaluation method defined upon the concept annotations. We conduct
extensive experiments contrasting our video summarizer to existing ones and
present detailed analyses about the dataset and the new evaluation method
- …