Search CORE

63 research outputs found

Video Shot Boundary Detection Using Generalized Eigenvalue Decomposition and Gaussian Transition Detection

Author: Amiri Ali
Fathy Mahmood
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

Shot boundary detection is the first step of the video analysis, summarization and retrieval. In this paper, we propose a novel shot boundary detection algorithm using Generalized Eigenvalue Decomposition (GED) and modeling of gradual transitions by Gaussian functions. Especially, we focus on the challenges of detecting the gradual shots and extracting appropriate spatio-temporal features, which have effects on the ability of algorithm to detect shot boundaries efficiently. We derive a theorem that discuss about some new features of GED which could be used in the video processing algorithms. Our innovative explanation utilizes this theorem in the defining of new distance metric in Eigen space for comparing video frames. The distance function has abrupt changes in hard cut transitions and semi-Gaussian behavior in gradual transitions. The algorithm detects the transitions by analyzing this distance function. Finally we report the experimental results using large-scale test sets provided by the TRECVID 2006 which has evaluations for hard cut and gradual shot boundary detection

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Highly efficient low-level feature extraction for video representation and retrieval.

Author: Calie Janko
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2004
Field of study

PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

Queen Mary Research Online

OpenGrey Repository

Robust and efficient techniques for automatic video segmentation.

Author
Publication venue
Publication date: 01/01/1998
Field of study

by Lam Cheung Fai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 174-179).Abstract also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.2Chapter 1.2 --- Motivation --- p.5Chapter 1.3 --- Problems --- p.7Chapter 1.3.1 --- Illumination Changes and Motions in Videos --- p.7Chapter 1.3.2 --- Variations in Video Scene Characteristics --- p.8Chapter 1.3.3 --- High Complexity of Algorithms --- p.10Chapter 1.3.4 --- Heterogeneous Approaches to Video Segmentation --- p.10Chapter 1.4 --- Objectives and Approaches --- p.11Chapter 1.5 --- Organization of the Thesis --- p.13Chapter 2 --- Related Work --- p.15Chapter 2.1 --- Algorithms for Uncompressed Videos --- p.16Chapter 2.1.1 --- Pixel-based Method --- p.16Chapter 2.1.2 --- Histogram-based Method --- p.17Chapter 2.1.3 --- Motion-based Algorithms --- p.18Chapter 2.1.4 --- Color-ratio Based Algorithms --- p.18Chapter 2.2 --- Algorithms for Compressed Videos --- p.19Chapter 2.2.1 --- Algorithms based on JPEG Image Sequences --- p.19Chapter 2.2.2 --- Algorithms based on MPEG Videos --- p.20Chapter 2.2.3 --- Algorithms based on VQ Compressed Videos --- p.21Chapter 2.3 --- Frame Difference Analysis Methods --- p.21Chapter 2.3.1 --- Scene Cut Detection --- p.21Chapter 2.3.2 --- Gradual Transition Detection --- p.22Chapter 2.4 --- Speedup Techniques --- p.23Chapter 2.5 --- Other Approaches --- p.24Chapter 3 --- Analysis and Enhancement of Existing Algorithms --- p.25Chapter 3.1 --- Introduction --- p.25Chapter 3.2 --- Video Segmentation Algorithms --- p.26Chapter 3.2.1 --- Frame Difference Metrics --- p.26Chapter 3.2.2 --- Frame Difference Analysis Methods --- p.29Chapter 3.3 --- Analysis of Feature Extraction Algorithms --- p.30Chapter 3.3.1 --- Pair-wise pixel comparison --- p.30Chapter 3.3.2 --- Color histogram comparison --- p.34Chapter 3.3.3 --- Pair-wise block-based comparison of DCT coefficients --- p.38Chapter 3.3.4 --- Pair-wise pixel comparison of DC-images --- p.42Chapter 3.4 --- Analysis of Scene Change Detection Methods --- p.45Chapter 3.4.1 --- Global Threshold Method --- p.45Chapter 3.4.2 --- Sliding Window Method --- p.46Chapter 3.5 --- Enhancements and Modifications --- p.47Chapter 3.5.1 --- Histogram Equalization --- p.49Chapter 3.5.2 --- DD Method --- p.52Chapter 3.5.3 --- LA Method --- p.56Chapter 3.5.4 --- Modification for pair-wise pixel comparison --- p.57Chapter 3.5.5 --- Modification for pair-wise DCT block comparison --- p.61Chapter 3.6 --- Conclusion --- p.69Chapter 4 --- Color Difference Histogram --- p.72Chapter 4.1 --- Introduction --- p.72Chapter 4.2 --- Color Difference Histogram --- p.73Chapter 4.2.1 --- Definition of Color Difference Histogram --- p.73Chapter 4.2.2 --- Sparse Distribution of CDH --- p.76Chapter 4.2.3 --- Resolution of CDH --- p.77Chapter 4.2.4 --- CDH-based Inter-frame Similarity Measure --- p.77Chapter 4.2.5 --- Computational Cost and Discriminating Power --- p.80Chapter 4.2.6 --- Suitability in Scene Change Detection --- p.83Chapter 4.3 --- Insensitivity to Illumination Changes --- p.89Chapter 4.3.1 --- Sensitivity of CDH --- p.90Chapter 4.3.2 --- Comparison with other feature extraction algorithms --- p.93Chapter 4.4 --- Orientation and Motion Invariant --- p.96Chapter 4.4.1 --- Camera Movements --- p.97Chapter 4.4.2 --- Object Motion --- p.100Chapter 4.4.3 --- Comparison with other feature extraction algorithms --- p.100Chapter 4.5 --- Performance of Scene Cut Detection --- p.102Chapter 4.6 --- Time Complexity Comparison --- p.105Chapter 4.7 --- Extension to DCT-compressed Images --- p.106Chapter 4.7.1 --- Performance of scene cut detection --- p.108Chapter 4.8 --- Conclusion --- p.109Chapter 5 --- Scene Change Detection --- p.111Chapter 5.1 --- Introduction --- p.111Chapter 5.2 --- Previous Approaches --- p.112Chapter 5.2.1 --- Scene Cut Detection --- p.112Chapter 5.2.2 --- Gradual Transition Detection --- p.115Chapter 5.3 --- DD Method --- p.116Chapter 5.3.1 --- Detecting Scene Cuts --- p.117Chapter 5.3.2 --- Detecting 1-frame Transitions --- p.121Chapter 5.3.3 --- Detecting Gradual Transitions --- p.129Chapter 5.4 --- Local Thresholding --- p.131Chapter 5.5 --- Experimental Results --- p.134Chapter 5.5.1 --- Performance of CDH+DD and CDH+DL --- p.135Chapter 5.5.2 --- Performance of DD on other features --- p.144Chapter 5.6 --- Conclusion --- p.150Chapter 6 --- Motion Vector Based Approach --- p.151Chapter 6.1 --- Introduction --- p.151Chapter 6.2 --- Previous Approaches --- p.152Chapter 6.3 --- MPEG-I Video Stream Format --- p.153Chapter 6.4 --- Derivation of Frame Differences from Motion Vector Counts --- p.156Chapter 6.4.1 --- Types of Frame Pairs --- p.156Chapter 6.4.2 --- Conditions for Scene Changes --- p.157Chapter 6.4.3 --- Frame Difference Measure --- p.159Chapter 6.5 --- Experiment --- p.160Chapter 6.5.1 --- Performance of MV --- p.161Chapter 6.5.2 --- Performance Enhancement --- p.162Chapter 6.5.3 --- Limitations --- p.163Chapter 6.6 --- Conclusion --- p.164Chapter 7 --- Conclusion and Future Work --- p.165Chapter 7.1 --- Contributions --- p.165Chapter 7.2 --- Future Work --- p.169Chapter 7.3 --- Conclusion --- p.171Bibliography --- p.174Chapter A --- Sample Videos --- p.180Chapter B --- List of Abbreviations --- p.18

CUHK Digital Repository

Recommended from our members

Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation.

Author: Chen Juan
Publication venue: Department of Computing
Publication date: 01/01/2009
Field of study

Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then, iv objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation

Bradford Scholars

An approach to summarize video data in compressed domain

Author: Şimşek Gökhan
Publication venue: Izmir Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (Master)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2007Includes bibliographical references (leaves: 54-56)Text in English; Abstract: Turkish and Englishx, 59 leavesThe requirements to represent digital video and images efficiently and feasibly have collected great efforts on research, development and standardization over past 20 years. These efforts targeted a vast area of applications such as video on demand, digital TV/HDTV broadcasting, multimedia video databases, surveillance applications etc. Moreover, the applications demand more efficient collections of algorithms to enable lower bit rate levels, with acceptable quality depending on application requirements. In our time, most of the video content either stored, transmitted is in compressed form. The increase in the amount of video data that is being shared attracted interest of researchers on the interrelated problems of video summarization, indexing and abstraction. In this study, the scene cut detection in emerging ISO/ITU H264/AVC coded bit stream is realized by extracting spatio-temporal prediction information directly in the compressed domain. The syntax and semantics, parsing and decoding processes of ISO/ITU H264/AVC bit-stream is analyzed to detect scene information. Various video test data is constructed using Joint Video Team.s test model JM encoder, and implementations are made on JM decoder. The output of the study is the scene information to address video summarization, skimming, indexing applications that use the new generation ISO/ITU H264/AVC video

A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

Author: Arifin Sutjipoto
Arifin Sutjipoto
Publication venue: Electrical & Electronic Engineering, Imperial College London
Publication date: 01/10/2008
Field of study

VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

Spiral - Imperial College Digital Repository

Recommended from our members

MAC-REALM: A video content feature extraction and modelling framework

Author: Parmar Minaz
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the ‘data deluge’ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to ‘bridge’ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM

Brunel University Research Archive

Segmentation sémantique des contenus audio-visuels

Author: NESVADBA Jan
Publication venue
Publication date: 13/01/2021
Field of study

Dans ce travail, nous avons mis au point une méthode de segmentation des contenus audiovisuels applicable aux appareils de stockage domestiques pour cela nous avons expérimenté un système distribué pour l’analyse du contenu composé de modules individuels d’analyse : les Service Unit. L’un d’entre eux a été dédié à la caractérisation des éléments hors contenu, i.e. les publicités, et offre de bonnes performances. Parallèlement, nous avons testé différents détecteurs de changement de plans afin de retenir le meilleur d’entre eux pour la suite. Puis, nous avons proposé une étude des règles de production des films, i.e. grammaire de films, qui a permis de définir les séquences de Parallel Shot. Nous avons, ainsi, testé quatre méthodes de regroupement basées similarité afin de retenir la meilleure d’entre elles pour la suite. Finalement, nous avons recherché différentes méthodes de détection des frontières de scènes et avons obtenu les meilleurs résultats en combinant une méthode basée couleur avec un critère de longueur de plan. Ce dernier offre des performances justifiant son intégration dans les appareils de stockage grand public.In this work we elaborated a method for semantic segmentation of audiovisual content applicable for consumer electronics storage devices. For the specific solution we researched first a service-oriented distributed multimedia content analysis framework composed of individual content analysis modules, i.e. Service Units. One of the latter was dedicated to identify non-content related inserts, i.e. commercials blocks, which reached high performance results. In a subsequent step we researched and benchmarked various Shot Boundary Detectors and implement the best performing one as Service Unit. Here after, our study of production rules, i.e. film grammar, provided insights of Parallel Shot sequences, i.e. Cross-Cuttings and Shot-Reverse-Shots. We researched and benchmarked four similarity-based clustering methods, two colour- and two feature-point-based ones, in order to retain the best one for our final solution. Finally, we researched several audiovisual Scene Boundary Detector methods and achieved best results combining a colour-based method with a shot length based criteria. This Scene Boundary Detector identified semantic scene boundaries with a robustness of 66% for movies and 80% for series, which proofed to be sufficient for our envisioned application Advanced Content Navigation

Oskar Bordeaux

A video summarisation system for post-production

Author: Wills Ciaran
Publication venue
Publication date: 01/01/2003
Field of study

Post-production facilities deal with large amounts of digital video, which presents difficulties when tracking, managing and searching this material. Recent research work in image and video analysis promises to offer help in these tasks, but there is a gap between what these systems can provide and what users actually need. In particular the popular research models for indexing and retrieving visual data do not fit well with how users actually work. In this thesis we explore how image and video analysis can be applied to an online video collection to assist users in reviewing and searching for material faster, rather than purporting to do it for them. We introduce a framework for automatically generating static 2-dimen- sional storyboards from video sequences. The storyboard consists of a series of frames, one for each shot in the sequence, showing the principal objects and motions of the shot. The storyboards are rendered as vector images in a familiar comic book style, allowing them to be quickly viewed and understood. The process consists of three distinct steps: shot-change detection, object segmentation, and presentation. The nature of the video material encountered in a post-production fa- cility is quite different from other material such as television programmes. Video sequences such as commercials and music videos are highly dy- namic with very short shots, rapid transitions and ambiguous edits. Video is often heavily manipulated, causing difficulties for many video processing techniques. We study the performance of a variety of published shot-change de- tection algorithms on the type of highly dynamic video typically encoun- tered in post-production work. Finding their performance disappointing, we develop a novel algorithm for detecting cuts and fades that operates directly on Motion-JPEG compressed video, exploiting the DCT coeffi- cients to save computation. The algorithm shows superior performance on highly dynamic material while performing comparably to previous algorithms on other material

Glasgow Theses Service