62 research outputs found

    Co-Regularized Deep Representations for Video Summarization

    Full text link
    Compact keyframe-based video summaries are a popular way of generating viewership on video sharing platforms. Yet, creating relevant and compelling summaries for arbitrarily long videos with a small number of keyframes is a challenging task. We propose a comprehensive keyframe-based summarization framework combining deep convolutional neural networks and restricted Boltzmann machines. An original co-regularization scheme is used to discover meaningful subject-scene associations. The resulting multimodal representations are then used to select highly-relevant keyframes. A comprehensive user study is conducted comparing our proposed method to a variety of schemes, including the summarization currently in use by one of the most popular video sharing websites. The results show that our method consistently outperforms the baseline schemes for any given amount of keyframes both in terms of attractiveness and informativeness. The lead is even more significant for smaller summaries.Comment: Video summarization, deep convolutional neural networks, co-regularized restricted Boltzmann machine

    VSCAN: An Enhanced Video Summarization using Density-based Spatial Clustering

    Full text link
    In this paper, we present VSCAN, a novel approach for generating static video summaries. This approach is based on a modified DBSCAN clustering algorithm to summarize the video content utilizing both color and texture features of the video frames. The paper also introduces an enhanced evaluation method that depends on color and texture features. Video Summaries generated by VSCAN are compared with summaries generated by other approaches found in the literature and those created by users. Experimental results indicate that the video summaries generated by VSCAN have a higher quality than those generated by other approaches.Comment: arXiv admin note: substantial text overlap with arXiv:1401.3590 by other authors without attributio

    Deep attentive video summarization with distribution consistency learning

    Get PDF
    This article studies supervised video summarization by formulating it into a sequence-to-sequence learning framework, in which the input and output are sequences of original video frames and their predicted importance scores, respectively. Two critical issues are addressed in this article: short-term contextual attention insufficiency and distribution inconsistency. The former lies in the insufficiency of capturing the short-term contextual attention information within the video sequence itself since the existing approaches focus a lot on the long-term encoder-decoder attention. The latter refers to the distributions of predicted importance score sequence and the ground-truth sequence is inconsistent, which may lead to a suboptimal solution. To better mitigate the first issue, we incorporate a self-attention mechanism in the encoder to highlight the important keyframes in a short-term context. The proposed approach alongside the encoder-decoder attention constitutes our deep attentive models for video summarization. For the second one, we propose a distribution consistency learning method by employing a simple yet effective regularization loss term, which seeks a consistent distribution for the two sequences. Our final approach is dubbed as Attentive and Distribution consistent video Summarization (ADSum). Extensive experiments on benchmark data sets demonstrate the superiority of the proposed ADSum approach against state-of-the-art approaches

    Novel perspectives and approaches to video summarization

    Get PDF
    The increasing volume of videos requires efficient and effective techniques to index and structure videos. Video summarization is such a technique that extracts the essential information from a video, so that tasks such as comprehension by users and video content analysis can be conducted more effectively and efficiently. The research presented in this thesis investigates three novel perspectives of the video summarization problem and provides approaches to such perspectives. Our first perspective is to employ local keypoint to perform keyframe selection. Two criteria, namely Coverage and Redundancy, are introduced to guide the keyframe selection process in order to identify those representing maximum video content and sharing minimum redundancy. To efficiently deal with long videos, a top-down strategy is proposed, which splits the summarization problem to two sub-problems: scene identification and scene summarization. Our second perspective is to formulate the task of video summarization to the problem of sparse dictionary reconstruction. Our method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L2,1 norm, such that keyframes are directly selected as a sparse dictionary that can reconstruct the video frames. In addition, a Percentage Of Reconstruction (POR) criterion is proposed to intuitively guide users in selecting an appropriate length of the summary. In addition, an L2,0 constrained sparse dictionary selection model is also proposed to further verify the effectiveness of sparse dictionary reconstruction for video summarization. Lastly, we further investigate the multi-modal perspective of multimedia content summarization and enrichment. There are abundant images and videos on the Web, so it is highly desirable to effectively organize such resources for textual content enrichment. With the support of web scale images, our proposed system, namely StoryImaging, is capable of enriching arbitrary textual stories with visual content

    Key Frame Generation to Generate Activity Strip Based on Similarity Calculation

    Get PDF
    Management of video data is done for several purposes, such as to make the information more meaningful. Research has been conducted to manage the video in terms of detecting activity in a video. There are three stages to generate activity strip: the data source stage (preparation of the frames), the processing stage (analysis of the activity), and the final stage (the collection of key frames). The generation of activity strip is done by calculating the difference of the pixel values of two frames to detect a similarity. In this research, we used SAD (Sum of Absolute Difference) method to calculate the value of the difference of the frame. Similar frames can be grouped in the same cluster. Each cluster is considered as one frame (or multiple frames) to serve as a key frame. The key frames are used for the representation of the activity strip. A collection of activity strip will be arranged sequentially and continuously for the activity generation

    Improved Key Frame Extraction using Discrete Wavelet Transform with Modified Threshold Factor

    Get PDF
    Video summarization used for a different application like video object recognition and classification. In video processing, numerous frames containing similar information, this leads to time consumption and slow processing speed and complexity. By using key frames reducing the amount of memory needed for video data processing and complexity greatly. In this paper key frame extraction of Arabic isolated word using discrete wavelet transform (DWT) with modified threshold factor is proposed with different bases. The results for different wavelet basis db, sym and coif show the best result for numbers of key frames at the threshold factor value (0.75)
    • …
    corecore