10,439 research outputs found

    Activity-driven content adaptation for effective video summarisation

    Get PDF
    In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided

    Video summarisation: A conceptual framework and survey of the state of the art

    Get PDF
    This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users

    Video summarization based on local features

    Get PDF
    Keyframe extraction process consists on presenting an abstract of the entire video with the most representative frames. It is one of the basic procedures relating to video retrieval and summary. This paper present a novel method for keyframe extraction based on SURF local features. First, we select a group of candidate frames from a video shot using a leap extraction technique. Then, SURF is used to detect and describe local features on the candidate frames. After that, we analyzed those features to eliminate near duplicate keyframes, helping to keep a compact set, using FLANN method. We developed a comparative study to evaluate our method with three state of the art approaches based on local features. The results show that our method overcomes those approaches

    Multimodal video abstraction into a static document using deep learning

    Get PDF
    Abstraction is a strategy that gives the essential points of a document in a short period of time. The video abstraction approach proposed in this research is based on multi-modal video data, which comprises both audio and visual data. Segmenting the input video into scenes and obtaining a textual and visual summary for each scene are the major video abstraction procedures to summarize the video events into a static document. To recognize the shot and scene boundary from a video sequence, a hybrid features method was employed, which improves detection shot performance by selecting strong and flexible features. The most informative keyframes from each scene are then incorporated into the visual summary. A hybrid deep learning model was used for abstractive text summarization. The BBC archive provided the testing videos, which comprised BBC Learning English and BBC News. In addition, a news summary dataset was used to train a deep model. The performance of the proposed approaches was assessed using metrics like Rouge for textual summary, which achieved a 40.49% accuracy rate. While precision, recall, and F-score used for visual summary have achieved (94.9%) accuracy, which performed better than the other methods, according to the findings of the experiments

    Deep Features and Clustering Based Keyframes Selection with Security

    Get PDF
    The digital world is developing more quickly than ever. Multimedia processing and distribution, however become vulnerable issues due to the enormous quantity and significance of vital information. Therefore, extensive technologies and algorithms are required for the safe transmission of messages, images, and video files. This paper proposes a secure framework by acute integration of video summarization and image encryption. Three parts comprise the proposed cryptosystem framework. The informative frames are first extracted using an efficient and lightweight technique that make use of the color histogram-clustering (RGB-HSV) approach's processing capabilities. Each frame of a video is represented by deep features, which are based on an enhanced pre-trained Inception-v3 network. After that summary is obtain using the K-means optimal clustering algorithm. The representative keyframes then extracted using the clusters highest possible entropy nodes. Experimental validation on two well-known standard datasets demonstrates the proposed methods superiority to numerous state-of-the-art approaches. Finally, the proposed framework performs an efficient image encryption and decryption algorithm by employing a general linear group function GLn (F). The analysis and testing outcomes prove the superiority of the proposed adaptive RSA
    corecore