10 research outputs found
Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks
Shot boundary detection (SBD) is an important component of many video
analysis tasks, such as action recognition, video indexing, summarization and
editing. Previous work typically used a combination of low-level features like
color histograms, in conjunction with simple models such as SVMs. Instead, we
propose to learn shot detection end-to-end, from pixels to final shot
boundaries. For training such a model, we rely on our insight that all shot
boundaries are generated. Thus, we create a dataset with one million frames and
automatically generated transitions such as cuts, dissolves and fades. In order
to efficiently analyze hours of videos, we propose a Convolutional Neural
Network (CNN) which is fully convolutional in time, thus allowing to use a
large temporal context without the need to repeatedly processing frames. With
this architecture our method obtains state-of-the-art results while running at
an unprecedented speed of more than 120x real-time
Propose shot boundary detection methods by using visual hybrid features
Shot boundary detection is the fundamental technique that plays an important role in a variety of video processing tasks such as summarization, retrieval, object tracking, and so on. This technique involves segmenting a video sequence into shots, each of which is a sequence of interrelated temporal frames. This paper introduces two methods, where the first is for detecting the cut shot boundary via employing visual hybrid features, while the second method is to compare between them. This enhances the effectiveness of the performance of detecting the shot by selecting the strongest features. The first method was performed by utilizing hybrid features, which included statistics histogram of hue-saturation-value color space and grey level co-occurrence matrix. The second method was performed by utilizing hybrid features that include discrete wavelet transform and grey level co-occurrence matrix. The frame size decreased. This process had the advantage of reducing the computation time. Also used local adaptive thresholds, which enhanced the method’s performance. The tested videos were obtained from the BBC archive, which included BBC Learning English and BBC News. Experimental results have indicated that the second method has achieved (97.618%) accuracy performance, which was higher than the first and other methods using evaluation metrics
A New Action Recognition Framework for Video Highlights Summarization in Sporting Events
To date, machine learning for human action recognition in video has been
widely implemented in sports activities. Although some studies have been
successful in the past, precision is still the most significant concern. In
this study, we present a high-accuracy framework to automatically clip the
sports video stream by using a three-level prediction algorithm based on two
classical open-source structures, i.e., YOLO-v3 and OpenPose. It is found that
by using a modest amount of sports video training data, our methodology can
perform sports activity highlights clipping accurately. Comparing with the
previous systems, our methodology shows some advantages in accuracy. This study
may serve as a new clipping system to extend the potential applications of the
video summarization in sports field, as well as facilitates the development of
match analysis system.Comment: 18 pages, 3 figures, 4 table
Deep Features and Clustering Based Keyframes Selection with Security
The digital world is developing more quickly than ever. Multimedia processing and distribution, however become vulnerable issues due to the enormous quantity and significance of vital information. Therefore, extensive technologies and algorithms are required for the safe transmission of messages, images, and video files. This paper proposes a secure framework by acute integration of video summarization and image encryption. Three parts comprise the proposed cryptosystem framework. The informative frames are first extracted using an efficient and lightweight technique that make use of the color histogram-clustering (RGB-HSV) approach's processing capabilities. Each frame of a video is represented by deep features, which are based on an enhanced pre-trained Inception-v3 network. After that summary is obtain using the K-means optimal clustering algorithm. The representative keyframes then extracted using the clusters highest possible entropy nodes. Experimental validation on two well-known standard datasets demonstrates the proposed methods superiority to numerous state-of-the-art approaches. Finally, the proposed framework performs an efficient image encryption and decryption algorithm by employing a general linear group function GLn (F). The analysis and testing outcomes prove the superiority of the proposed adaptive RSA
PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation
Highlight detection models are typically trained to identify cues that make
visual content appealing or interesting for the general public, with the
objective of reducing a video to such moments. However, the "interestingness"
of a video segment or image is subjective. Thus, such highlight models provide
results of limited relevance for the individual user. On the other hand,
training one model per user is inefficient and requires large amounts of
personal information which is typically not available. To overcome these
limitations, we present a global ranking model which conditions on each
particular user's interests. Rather than training one model per user, our model
is personalized via its inputs, which allows it to effectively adapt its
predictions, given only a few user-specific examples. To train this model, we
create a large-scale dataset of users and the GIFs they created, giving us an
accurate indication of their interests. Our experiments show that using the
user history substantially improves the prediction accuracy. On our test set of
850 videos, our model improves the recall by 8% with respect to generic
highlight detectors. Furthermore, our method proves more precise than the
user-agnostic baselines even with just one person-specific example.Comment: Accepted for publication at the 2018 ACM Multimedia Conference (MM
'18
Using selfsupervised algorithms for video analysis and scene detection
With the increasing available audiovisual content, well-ordered and effective management of video is desired, and therefore, automatic, and accurate solutions for video indexing and retrieval are needed. Self-supervised learning algorithms with 3D convolutional neural networks are a promising solution for these tasks, thanks to its independence from human-annotations and its suitability to identify spatio-temporal features. This work presents a self-supervised algorithm for the analysis of video shots, accomplished by a two-stage implementation: 1- An algorithm that generates pseudo-labels for 20-frame samples with different automatically generated shot transitions (Hardcuts/Cropcuts, Dissolves, Fades in/out, Wipes) and 2- A fully convolutional 3D trained network with an overall achieved accuracy greater than 97% in the testing set. The model implemented is based in [5], improving the detection of large smooth transitions by implementing a larger temporal context. The transitions detected occur centered in the 10th and 11th frames of a 20-frame input window
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining