10 research outputs found

    Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks

    Full text link
    Shot boundary detection (SBD) is an important component of many video analysis tasks, such as action recognition, video indexing, summarization and editing. Previous work typically used a combination of low-level features like color histograms, in conjunction with simple models such as SVMs. Instead, we propose to learn shot detection end-to-end, from pixels to final shot boundaries. For training such a model, we rely on our insight that all shot boundaries are generated. Thus, we create a dataset with one million frames and automatically generated transitions such as cuts, dissolves and fades. In order to efficiently analyze hours of videos, we propose a Convolutional Neural Network (CNN) which is fully convolutional in time, thus allowing to use a large temporal context without the need to repeatedly processing frames. With this architecture our method obtains state-of-the-art results while running at an unprecedented speed of more than 120x real-time

    Propose shot boundary detection methods by using visual hybrid features

    Get PDF
    Shot boundary detection is the fundamental technique that plays an important role in a variety of video processing tasks such as summarization, retrieval, object tracking, and so on. This technique involves segmenting a video sequence into shots, each of which is a sequence of interrelated temporal frames. This paper introduces two methods, where the first is for detecting the cut shot boundary via employing visual hybrid features, while the second method is to compare between them. This enhances the effectiveness of the performance of detecting the shot by selecting the strongest features. The first method was performed by utilizing hybrid features, which included statistics histogram of hue-saturation-value color space and grey level co-occurrence matrix. The second method was performed by utilizing hybrid features that include discrete wavelet transform and grey level co-occurrence matrix. The frame size decreased. This process had the advantage of reducing the computation time. Also used local adaptive thresholds, which enhanced the method’s performance. The tested videos were obtained from the BBC archive, which included BBC Learning English and BBC News. Experimental results have indicated that the second method has achieved (97.618%) accuracy performance, which was higher than the first and other methods using evaluation metrics

    A New Action Recognition Framework for Video Highlights Summarization in Sporting Events

    Full text link
    To date, machine learning for human action recognition in video has been widely implemented in sports activities. Although some studies have been successful in the past, precision is still the most significant concern. In this study, we present a high-accuracy framework to automatically clip the sports video stream by using a three-level prediction algorithm based on two classical open-source structures, i.e., YOLO-v3 and OpenPose. It is found that by using a modest amount of sports video training data, our methodology can perform sports activity highlights clipping accurately. Comparing with the previous systems, our methodology shows some advantages in accuracy. This study may serve as a new clipping system to extend the potential applications of the video summarization in sports field, as well as facilitates the development of match analysis system.Comment: 18 pages, 3 figures, 4 table

    Deep Features and Clustering Based Keyframes Selection with Security

    Get PDF
    The digital world is developing more quickly than ever. Multimedia processing and distribution, however become vulnerable issues due to the enormous quantity and significance of vital information. Therefore, extensive technologies and algorithms are required for the safe transmission of messages, images, and video files. This paper proposes a secure framework by acute integration of video summarization and image encryption. Three parts comprise the proposed cryptosystem framework. The informative frames are first extracted using an efficient and lightweight technique that make use of the color histogram-clustering (RGB-HSV) approach's processing capabilities. Each frame of a video is represented by deep features, which are based on an enhanced pre-trained Inception-v3 network. After that summary is obtain using the K-means optimal clustering algorithm. The representative keyframes then extracted using the clusters highest possible entropy nodes. Experimental validation on two well-known standard datasets demonstrates the proposed methods superiority to numerous state-of-the-art approaches. Finally, the proposed framework performs an efficient image encryption and decryption algorithm by employing a general linear group function GLn (F). The analysis and testing outcomes prove the superiority of the proposed adaptive RSA

    PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation

    Full text link
    Highlight detection models are typically trained to identify cues that make visual content appealing or interesting for the general public, with the objective of reducing a video to such moments. However, the "interestingness" of a video segment or image is subjective. Thus, such highlight models provide results of limited relevance for the individual user. On the other hand, training one model per user is inefficient and requires large amounts of personal information which is typically not available. To overcome these limitations, we present a global ranking model which conditions on each particular user's interests. Rather than training one model per user, our model is personalized via its inputs, which allows it to effectively adapt its predictions, given only a few user-specific examples. To train this model, we create a large-scale dataset of users and the GIFs they created, giving us an accurate indication of their interests. Our experiments show that using the user history substantially improves the prediction accuracy. On our test set of 850 videos, our model improves the recall by 8% with respect to generic highlight detectors. Furthermore, our method proves more precise than the user-agnostic baselines even with just one person-specific example.Comment: Accepted for publication at the 2018 ACM Multimedia Conference (MM '18

    Using selfsupervised algorithms for video analysis and scene detection

    Get PDF
    With the increasing available audiovisual content, well-ordered and effective management of video is desired, and therefore, automatic, and accurate solutions for video indexing and retrieval are needed. Self-supervised learning algorithms with 3D convolutional neural networks are a promising solution for these tasks, thanks to its independence from human-annotations and its suitability to identify spatio-temporal features. This work presents a self-supervised algorithm for the analysis of video shots, accomplished by a two-stage implementation: 1- An algorithm that generates pseudo-labels for 20-frame samples with different automatically generated shot transitions (Hardcuts/Cropcuts, Dissolves, Fades in/out, Wipes) and 2- A fully convolutional 3D trained network with an overall achieved accuracy greater than 97% in the testing set. The model implemented is based in [5], improving the detection of large smooth transitions by implementing a larger temporal context. The transitions detected occur centered in the 10th and 11th frames of a 20-frame input window

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
    corecore