63 research outputs found

    Video Shot Boundary Detection Using Generalized Eigenvalue Decomposition and Gaussian Transition Detection

    Get PDF
    Shot boundary detection is the first step of the video analysis, summarization and retrieval. In this paper, we propose a novel shot boundary detection algorithm using Generalized Eigenvalue Decomposition (GED) and modeling of gradual transitions by Gaussian functions. Especially, we focus on the challenges of detecting the gradual shots and extracting appropriate spatio-temporal features, which have effects on the ability of algorithm to detect shot boundaries efficiently. We derive a theorem that discuss about some new features of GED which could be used in the video processing algorithms. Our innovative explanation utilizes this theorem in the defining of new distance metric in Eigen space for comparing video frames. The distance function has abrupt changes in hard cut transitions and semi-Gaussian behavior in gradual transitions. The algorithm detects the transitions by analyzing this distance function. Finally we report the experimental results using large-scale test sets provided by the TRECVID 2006 which has evaluations for hard cut and gradual shot boundary detection

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    Robust and efficient techniques for automatic video segmentation.

    Get PDF
    by Lam Cheung Fai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 174-179).Abstract also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.2Chapter 1.2 --- Motivation --- p.5Chapter 1.3 --- Problems --- p.7Chapter 1.3.1 --- Illumination Changes and Motions in Videos --- p.7Chapter 1.3.2 --- Variations in Video Scene Characteristics --- p.8Chapter 1.3.3 --- High Complexity of Algorithms --- p.10Chapter 1.3.4 --- Heterogeneous Approaches to Video Segmentation --- p.10Chapter 1.4 --- Objectives and Approaches --- p.11Chapter 1.5 --- Organization of the Thesis --- p.13Chapter 2 --- Related Work --- p.15Chapter 2.1 --- Algorithms for Uncompressed Videos --- p.16Chapter 2.1.1 --- Pixel-based Method --- p.16Chapter 2.1.2 --- Histogram-based Method --- p.17Chapter 2.1.3 --- Motion-based Algorithms --- p.18Chapter 2.1.4 --- Color-ratio Based Algorithms --- p.18Chapter 2.2 --- Algorithms for Compressed Videos --- p.19Chapter 2.2.1 --- Algorithms based on JPEG Image Sequences --- p.19Chapter 2.2.2 --- Algorithms based on MPEG Videos --- p.20Chapter 2.2.3 --- Algorithms based on VQ Compressed Videos --- p.21Chapter 2.3 --- Frame Difference Analysis Methods --- p.21Chapter 2.3.1 --- Scene Cut Detection --- p.21Chapter 2.3.2 --- Gradual Transition Detection --- p.22Chapter 2.4 --- Speedup Techniques --- p.23Chapter 2.5 --- Other Approaches --- p.24Chapter 3 --- Analysis and Enhancement of Existing Algorithms --- p.25Chapter 3.1 --- Introduction --- p.25Chapter 3.2 --- Video Segmentation Algorithms --- p.26Chapter 3.2.1 --- Frame Difference Metrics --- p.26Chapter 3.2.2 --- Frame Difference Analysis Methods --- p.29Chapter 3.3 --- Analysis of Feature Extraction Algorithms --- p.30Chapter 3.3.1 --- Pair-wise pixel comparison --- p.30Chapter 3.3.2 --- Color histogram comparison --- p.34Chapter 3.3.3 --- Pair-wise block-based comparison of DCT coefficients --- p.38Chapter 3.3.4 --- Pair-wise pixel comparison of DC-images --- p.42Chapter 3.4 --- Analysis of Scene Change Detection Methods --- p.45Chapter 3.4.1 --- Global Threshold Method --- p.45Chapter 3.4.2 --- Sliding Window Method --- p.46Chapter 3.5 --- Enhancements and Modifications --- p.47Chapter 3.5.1 --- Histogram Equalization --- p.49Chapter 3.5.2 --- DD Method --- p.52Chapter 3.5.3 --- LA Method --- p.56Chapter 3.5.4 --- Modification for pair-wise pixel comparison --- p.57Chapter 3.5.5 --- Modification for pair-wise DCT block comparison --- p.61Chapter 3.6 --- Conclusion --- p.69Chapter 4 --- Color Difference Histogram --- p.72Chapter 4.1 --- Introduction --- p.72Chapter 4.2 --- Color Difference Histogram --- p.73Chapter 4.2.1 --- Definition of Color Difference Histogram --- p.73Chapter 4.2.2 --- Sparse Distribution of CDH --- p.76Chapter 4.2.3 --- Resolution of CDH --- p.77Chapter 4.2.4 --- CDH-based Inter-frame Similarity Measure --- p.77Chapter 4.2.5 --- Computational Cost and Discriminating Power --- p.80Chapter 4.2.6 --- Suitability in Scene Change Detection --- p.83Chapter 4.3 --- Insensitivity to Illumination Changes --- p.89Chapter 4.3.1 --- Sensitivity of CDH --- p.90Chapter 4.3.2 --- Comparison with other feature extraction algorithms --- p.93Chapter 4.4 --- Orientation and Motion Invariant --- p.96Chapter 4.4.1 --- Camera Movements --- p.97Chapter 4.4.2 --- Object Motion --- p.100Chapter 4.4.3 --- Comparison with other feature extraction algorithms --- p.100Chapter 4.5 --- Performance of Scene Cut Detection --- p.102Chapter 4.6 --- Time Complexity Comparison --- p.105Chapter 4.7 --- Extension to DCT-compressed Images --- p.106Chapter 4.7.1 --- Performance of scene cut detection --- p.108Chapter 4.8 --- Conclusion --- p.109Chapter 5 --- Scene Change Detection --- p.111Chapter 5.1 --- Introduction --- p.111Chapter 5.2 --- Previous Approaches --- p.112Chapter 5.2.1 --- Scene Cut Detection --- p.112Chapter 5.2.2 --- Gradual Transition Detection --- p.115Chapter 5.3 --- DD Method --- p.116Chapter 5.3.1 --- Detecting Scene Cuts --- p.117Chapter 5.3.2 --- Detecting 1-frame Transitions --- p.121Chapter 5.3.3 --- Detecting Gradual Transitions --- p.129Chapter 5.4 --- Local Thresholding --- p.131Chapter 5.5 --- Experimental Results --- p.134Chapter 5.5.1 --- Performance of CDH+DD and CDH+DL --- p.135Chapter 5.5.2 --- Performance of DD on other features --- p.144Chapter 5.6 --- Conclusion --- p.150Chapter 6 --- Motion Vector Based Approach --- p.151Chapter 6.1 --- Introduction --- p.151Chapter 6.2 --- Previous Approaches --- p.152Chapter 6.3 --- MPEG-I Video Stream Format --- p.153Chapter 6.4 --- Derivation of Frame Differences from Motion Vector Counts --- p.156Chapter 6.4.1 --- Types of Frame Pairs --- p.156Chapter 6.4.2 --- Conditions for Scene Changes --- p.157Chapter 6.4.3 --- Frame Difference Measure --- p.159Chapter 6.5 --- Experiment --- p.160Chapter 6.5.1 --- Performance of MV --- p.161Chapter 6.5.2 --- Performance Enhancement --- p.162Chapter 6.5.3 --- Limitations --- p.163Chapter 6.6 --- Conclusion --- p.164Chapter 7 --- Conclusion and Future Work --- p.165Chapter 7.1 --- Contributions --- p.165Chapter 7.2 --- Future Work --- p.169Chapter 7.3 --- Conclusion --- p.171Bibliography --- p.174Chapter A --- Sample Videos --- p.180Chapter B --- List of Abbreviations --- p.18

    An approach to summarize video data in compressed domain

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2007Includes bibliographical references (leaves: 54-56)Text in English; Abstract: Turkish and Englishx, 59 leavesThe requirements to represent digital video and images efficiently and feasibly have collected great efforts on research, development and standardization over past 20 years. These efforts targeted a vast area of applications such as video on demand, digital TV/HDTV broadcasting, multimedia video databases, surveillance applications etc. Moreover, the applications demand more efficient collections of algorithms to enable lower bit rate levels, with acceptable quality depending on application requirements. In our time, most of the video content either stored, transmitted is in compressed form. The increase in the amount of video data that is being shared attracted interest of researchers on the interrelated problems of video summarization, indexing and abstraction. In this study, the scene cut detection in emerging ISO/ITU H264/AVC coded bit stream is realized by extracting spatio-temporal prediction information directly in the compressed domain. The syntax and semantics, parsing and decoding processes of ISO/ITU H264/AVC bit-stream is analyzed to detect scene information. Various video test data is constructed using Joint Video Team.s test model JM encoder, and implementations are made on JM decoder. The output of the study is the scene information to address video summarization, skimming, indexing applications that use the new generation ISO/ITU H264/AVC video

    A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

    No full text
    VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, aŸective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of aŸect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and aŸective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of aŸective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

    Segmentation sémantique des contenus audio-visuels

    Get PDF
    Dans ce travail, nous avons mis au point une mĂ©thode de segmentation des contenus audiovisuels applicable aux appareils de stockage domestiques pour cela nous avons expĂ©rimentĂ© un systĂšme distribuĂ© pour l’analyse du contenu composĂ© de modules individuels d’analyse : les Service Unit. L’un d’entre eux a Ă©tĂ© dĂ©diĂ© Ă  la caractĂ©risation des Ă©lĂ©ments hors contenu, i.e. les publicitĂ©s, et offre de bonnes performances. ParallĂšlement, nous avons testĂ© diffĂ©rents dĂ©tecteurs de changement de plans afin de retenir le meilleur d’entre eux pour la suite. Puis, nous avons proposĂ© une Ă©tude des rĂšgles de production des films, i.e. grammaire de films, qui a permis de dĂ©finir les sĂ©quences de Parallel Shot. Nous avons, ainsi, testĂ© quatre mĂ©thodes de regroupement basĂ©es similaritĂ© afin de retenir la meilleure d’entre elles pour la suite. Finalement, nous avons recherchĂ© diffĂ©rentes mĂ©thodes de dĂ©tection des frontiĂšres de scĂšnes et avons obtenu les meilleurs rĂ©sultats en combinant une mĂ©thode basĂ©e couleur avec un critĂšre de longueur de plan. Ce dernier offre des performances justifiant son intĂ©gration dans les appareils de stockage grand public.In this work we elaborated a method for semantic segmentation of audiovisual content applicable for consumer electronics storage devices. For the specific solution we researched first a service-oriented distributed multimedia content analysis framework composed of individual content analysis modules, i.e. Service Units. One of the latter was dedicated to identify non-content related inserts, i.e. commercials blocks, which reached high performance results. In a subsequent step we researched and benchmarked various Shot Boundary Detectors and implement the best performing one as Service Unit. Here after, our study of production rules, i.e. film grammar, provided insights of Parallel Shot sequences, i.e. Cross-Cuttings and Shot-Reverse-Shots. We researched and benchmarked four similarity-based clustering methods, two colour- and two feature-point-based ones, in order to retain the best one for our final solution. Finally, we researched several audiovisual Scene Boundary Detector methods and achieved best results combining a colour-based method with a shot length based criteria. This Scene Boundary Detector identified semantic scene boundaries with a robustness of 66% for movies and 80% for series, which proofed to be sufficient for our envisioned application Advanced Content Navigation

    A video summarisation system for post-production

    Get PDF
    Post-production facilities deal with large amounts of digital video, which presents difficulties when tracking, managing and searching this material. Recent research work in image and video analysis promises to offer help in these tasks, but there is a gap between what these systems can provide and what users actually need. In particular the popular research models for indexing and retrieving visual data do not fit well with how users actually work. In this thesis we explore how image and video analysis can be applied to an online video collection to assist users in reviewing and searching for material faster, rather than purporting to do it for them. We introduce a framework for automatically generating static 2-dimen- sional storyboards from video sequences. The storyboard consists of a series of frames, one for each shot in the sequence, showing the principal objects and motions of the shot. The storyboards are rendered as vector images in a familiar comic book style, allowing them to be quickly viewed and understood. The process consists of three distinct steps: shot-change detection, object segmentation, and presentation. The nature of the video material encountered in a post-production fa- cility is quite different from other material such as television programmes. Video sequences such as commercials and music videos are highly dy- namic with very short shots, rapid transitions and ambiguous edits. Video is often heavily manipulated, causing difficulties for many video processing techniques. We study the performance of a variety of published shot-change de- tection algorithms on the type of highly dynamic video typically encoun- tered in post-production work. Finding their performance disappointing, we develop a novel algorithm for detecting cuts and fades that operates directly on Motion-JPEG compressed video, exploiting the DCT coeffi- cients to save computation. The algorithm shows superior performance on highly dynamic material while performing comparably to previous algorithms on other material
    • 

    corecore