206 research outputs found

    Video shot boundary detection: seven years of TRECVid activity

    Get PDF
    Shot boundary detection (SBD) is the process of automatically detecting the boundaries between shots in video. It is a problem which has attracted much attention since video became available in digital form as it is an essential pre-processing step to almost all video analysis, indexing, summarisation, search, and other content-based operations. Automatic SBD was one of the tracks of activity within the annual TRECVid benchmarking exercise, each year from 2001 to 2007 inclusive. Over those seven years we have seen 57 different research groups from across the world work to determine the best approaches to SBD while using a common dataset and common scoring metrics. In this paper we present an overview of the TRECVid shot boundary detection task, a high-level overview of the most significant of the approaches taken, and a comparison of performances, focussing on one year (2005) as an example

    Iconic Indexing for Video Search

    Get PDF
    Submitted for the degree of Doctor of Philosophy, Queen Mary, University of London

    Video Categorization Using Semantics and Semiotics

    Get PDF
    There is a great need to automatically segment, categorize, and annotate video data, and to develop efficient tools for browsing and searching. We believe that the categorization of videos can be achieved by exploring the concepts and meanings of the videos. This task requires bridging the gap between low-level content and high-level concepts (or semantics). Once a relationship is established between the low-level computable features of the video and its semantics, the user would be able to navigate through videos through the use of concepts and ideas (for example, a user could extract only those scenes in an action film that actually contain fights) rat her than sequentially browsing the whole video. However, this relationship must follow the norms of human perception and abide by the rules that are most often followed by the creators (directors) of these videos. These rules are called film grammar in video production literature. Like any natural language, this grammar has several dialects, but it has been acknowledged to be universal. Therefore, the knowledge of film grammar can be exploited effectively for the understanding of films. To interpret an idea using the grammar, we need to first understand the symbols, as in natural languages, and second, understand the rules of combination of these symbols to represent concepts. In order to develop algorithms that exploit this film grammar, it is necessary to relate the symbols of the grammar to computable video features. In this dissertation, we have identified a set of computable features of videos and have developed methods to estimate them. A computable feature of audio-visual data is defined as any statistic of available data that can be automatically extracted using image/signal processing and computer vision techniques. These features are global in nature and are extracted using whole images, therefore, they do not require any object detection, tracking and classification. These features include video shots, shot length, shot motion content, color distribution, key-lighting, and audio energy. We use these features and exploit the knowledge of ubiquitous film grammar to solve three related problems: segmentation and categorization of talk and game shows; classification of movie genres based on the previews; and segmentation and representation of full-length Hollywood movies and sitcoms. We have developed a method for organizing videos of talk and game shows by automatically separating the program segments from the commercials and then classifying each shot as the host\u27s or guest\u27s shot. In our approach, we rely primarily on information contained in shot transitions and utilize the inherent difference in the scene structure (grammar) of commercials and talk shows. A data structure called a shot connectivity graph is constructed, which links shots over time using temporal proximity and color similarity constraints. Analysis of the shot connectivity graph helps us to separate commercials from program segments. This is done by first detecting stories, and then assigning a weight to each story based on its likelihood of being a commercial or a program segment. We further analyze stories to distinguish shots of the hosts from those of the guests. We have performed extensive experiments on eight full-length talk shows (e.g. Larry King Live, Meet the Press, News Night) and game shows (Who Wants To Be A Millionaire), and have obtained excellent classification with 96% recall and 99% precision. http://www.cs.ucf.edu/~vision/projects/LarryKing/LarryKing.html Secondly, we have developed a novel method for genre classification of films using film previews. In our approach, we classify previews into four broad categories: comedies, action, dramas or horror films. Computable video features are combined in a framework with cinematic principles to provide a mapping to these four high-level semantic classes. We have developed two methods for genre classification; (a) a hierarchical method and (b) an unsupervised classification met hod. In the hierarchical method, we first classify movies into action and non-action categories based on the average shot length and motion content in the previews. Next, non-action movies are sub-classified into comedy, horror or drama categories by examining their lighting key. Finally, action movies are ranked on the basis of number of explosions/gunfire events. In the unsupervised method for classifying movies, a mean shift classifier is used to discover the structure of the mapping between the computable features and each film genre. We have conducted extensive experiments on over a hundred film previews and demonstrated that low-level features can be efficiently utilized for movie classification. We achieved about 87% successful classification. http://www.cs.ucf.edu/-vision/projects/movieClassification/movieClmsification.html Finally, we have addressed the problem of detecting scene boundaries in full-length feature movies. We have developed two novel approaches to automatically find scenes in the videos. Our first approach is a two-pass algorithm. In the first pass, shots are clustered by computing backward shot coherence; a shot color similarity measure that detects potential scene boundaries (PSBs) in the videos. In the second pass we compute scene dynamics for each scene as a function of shot length and the motion content in the potential scenes. In this pass, a scene-merging criterion is used to remove weak PSBs in order to reduce over-segmentation. In our second approach, we cluster shots into scenes by transforming this task into a graph-partitioning problem. This is achieved by constructing a weighted undirected graph called a shot similarity graph (SSG), where each node represents a shot and the edges between the shots are weighted by their similarities (color and motion). The SSG is then split into sub-graphs by applying the normalized cut technique for graph partitioning. The partitions obtained represent individual scenes in the video. We further extend the framework to automatically detect the best representative key frames of identified scenes. With this approach, we are able to obtain a compact representation of huge videos in a small number of key frames. We have performed experiments on five Hollywood films (Terminator II, Top Gun, Gone In 60 Seconds, Golden Eye, and A Beautiful Mind) and one TV sitcom (Seinfeld) that demonstrate the effectiveness of our approach. We achieved about 80% recall and 63% precision in our experiments. http://www.cs.ucf.edu/~vision/projects/sceneSeg/sceneSeg.htm

    Surveillance centric coding

    Get PDF
    PhDThe research work presented in this thesis focuses on the development of techniques specific to surveillance videos for efficient video compression with higher processing speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher compression efficiency. The framework of SVC is modified to support Surveillance Centric Coding (SCC). Motion estimation techniques specific to surveillance videos are proposed in order to speed up the compression process of the SCC. The main contributions of the research work presented in this thesis are divided into two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims to achieve bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the SCC encoder communicates with the Video Content Analysis (VCA) module that detects events of interest in video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by exploiting the scalability properties of the employed codec. Time segments containing events relevant to surveillance application are encoded using high spatiotemporal resolution and quality while the irrelevant portions from the surveillance standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is possible; for instance for the transmission purposes. Experimental evaluation showed that significant reduction in bit-rate can be achieved by the proposed approach without loss of information relevant to surveillance applications. In addition to more optimal compression strategy, novel approaches to performing efficient motion estimation specific to surveillance videos are proposed and implemented with experimental results. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Different approaches for selective motion estimation, GOP based, Frame based and Block based, are implemented. In the former, motion estimation is performed for the whole group of pictures (GOP) only when a moving object is detected for any frame of the GOP. iii While for the Frame based approach; each frame is tested for the motion activity and consequently for selective motion estimation. The selective motion estimation approach is further explored at a lower level as Block based selective motion estimation. Experimental evaluation showed that significant reduction in computational complexity can be achieved by applying the proposed strategy. In addition to selective motion estimation, a tracker based motion estimation and fast full search using multiple reference frames has been proposed for the surveillance videos. Extensive testing on different surveillance videos shows benefits of application of proposed approaches to achieve the goals of the SCC

    Multimedia Retrieval

    Get PDF

    An object-based approach to retrieval of image and video content

    Get PDF
    Promising new directions have been opened up for content-based visual retrieval in recent years. Object-based retrieval which allows users to manipulate video objects as part of their searching and browsing interaction, is one of these. It is the purpose of this thesis to constitute itself as a part of a larger stream of research that investigates visual objects as a possible approach to advancing the use of semantics in content-based visual retrieval. The notion of using objects in video retrieval has been seen as desirable for some years, but only very recently has technology started to allow even very basic object-location functions on video. The main hurdles to greater use of objects in video retrieval are the overhead of object segmentation on large amounts of video and the issue of whether objects can actually be used efficiently for multimedia retrieval. Despite this, there are already some examples of work which supports retrieval based on video objects. This thesis investigates an object-based approach to content-based visual retrieval. The main research contributions of this work are a study of shot boundary detection on compressed domain video where a fast detection approach is proposed and evaluated, and a study on the use of objects in interactive image retrieval. An object-based retrieval framework is developed in order to investigate object-based retrieval on a corpus of natural image and video. This framework contains the entire processing chain required to analyse, index and interactively retrieve images and video via object-to-object matching. The experimental results indicate that object-based searching consistently outperforms image-based search using low-level features. This result goes some way towards validating the approach of allowing users to select objects as a basis for searching video archives when the information need dictates it as appropriate

    Robust and efficient techniques for automatic video segmentation.

    Get PDF
    by Lam Cheung Fai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 174-179).Abstract also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.2Chapter 1.2 --- Motivation --- p.5Chapter 1.3 --- Problems --- p.7Chapter 1.3.1 --- Illumination Changes and Motions in Videos --- p.7Chapter 1.3.2 --- Variations in Video Scene Characteristics --- p.8Chapter 1.3.3 --- High Complexity of Algorithms --- p.10Chapter 1.3.4 --- Heterogeneous Approaches to Video Segmentation --- p.10Chapter 1.4 --- Objectives and Approaches --- p.11Chapter 1.5 --- Organization of the Thesis --- p.13Chapter 2 --- Related Work --- p.15Chapter 2.1 --- Algorithms for Uncompressed Videos --- p.16Chapter 2.1.1 --- Pixel-based Method --- p.16Chapter 2.1.2 --- Histogram-based Method --- p.17Chapter 2.1.3 --- Motion-based Algorithms --- p.18Chapter 2.1.4 --- Color-ratio Based Algorithms --- p.18Chapter 2.2 --- Algorithms for Compressed Videos --- p.19Chapter 2.2.1 --- Algorithms based on JPEG Image Sequences --- p.19Chapter 2.2.2 --- Algorithms based on MPEG Videos --- p.20Chapter 2.2.3 --- Algorithms based on VQ Compressed Videos --- p.21Chapter 2.3 --- Frame Difference Analysis Methods --- p.21Chapter 2.3.1 --- Scene Cut Detection --- p.21Chapter 2.3.2 --- Gradual Transition Detection --- p.22Chapter 2.4 --- Speedup Techniques --- p.23Chapter 2.5 --- Other Approaches --- p.24Chapter 3 --- Analysis and Enhancement of Existing Algorithms --- p.25Chapter 3.1 --- Introduction --- p.25Chapter 3.2 --- Video Segmentation Algorithms --- p.26Chapter 3.2.1 --- Frame Difference Metrics --- p.26Chapter 3.2.2 --- Frame Difference Analysis Methods --- p.29Chapter 3.3 --- Analysis of Feature Extraction Algorithms --- p.30Chapter 3.3.1 --- Pair-wise pixel comparison --- p.30Chapter 3.3.2 --- Color histogram comparison --- p.34Chapter 3.3.3 --- Pair-wise block-based comparison of DCT coefficients --- p.38Chapter 3.3.4 --- Pair-wise pixel comparison of DC-images --- p.42Chapter 3.4 --- Analysis of Scene Change Detection Methods --- p.45Chapter 3.4.1 --- Global Threshold Method --- p.45Chapter 3.4.2 --- Sliding Window Method --- p.46Chapter 3.5 --- Enhancements and Modifications --- p.47Chapter 3.5.1 --- Histogram Equalization --- p.49Chapter 3.5.2 --- DD Method --- p.52Chapter 3.5.3 --- LA Method --- p.56Chapter 3.5.4 --- Modification for pair-wise pixel comparison --- p.57Chapter 3.5.5 --- Modification for pair-wise DCT block comparison --- p.61Chapter 3.6 --- Conclusion --- p.69Chapter 4 --- Color Difference Histogram --- p.72Chapter 4.1 --- Introduction --- p.72Chapter 4.2 --- Color Difference Histogram --- p.73Chapter 4.2.1 --- Definition of Color Difference Histogram --- p.73Chapter 4.2.2 --- Sparse Distribution of CDH --- p.76Chapter 4.2.3 --- Resolution of CDH --- p.77Chapter 4.2.4 --- CDH-based Inter-frame Similarity Measure --- p.77Chapter 4.2.5 --- Computational Cost and Discriminating Power --- p.80Chapter 4.2.6 --- Suitability in Scene Change Detection --- p.83Chapter 4.3 --- Insensitivity to Illumination Changes --- p.89Chapter 4.3.1 --- Sensitivity of CDH --- p.90Chapter 4.3.2 --- Comparison with other feature extraction algorithms --- p.93Chapter 4.4 --- Orientation and Motion Invariant --- p.96Chapter 4.4.1 --- Camera Movements --- p.97Chapter 4.4.2 --- Object Motion --- p.100Chapter 4.4.3 --- Comparison with other feature extraction algorithms --- p.100Chapter 4.5 --- Performance of Scene Cut Detection --- p.102Chapter 4.6 --- Time Complexity Comparison --- p.105Chapter 4.7 --- Extension to DCT-compressed Images --- p.106Chapter 4.7.1 --- Performance of scene cut detection --- p.108Chapter 4.8 --- Conclusion --- p.109Chapter 5 --- Scene Change Detection --- p.111Chapter 5.1 --- Introduction --- p.111Chapter 5.2 --- Previous Approaches --- p.112Chapter 5.2.1 --- Scene Cut Detection --- p.112Chapter 5.2.2 --- Gradual Transition Detection --- p.115Chapter 5.3 --- DD Method --- p.116Chapter 5.3.1 --- Detecting Scene Cuts --- p.117Chapter 5.3.2 --- Detecting 1-frame Transitions --- p.121Chapter 5.3.3 --- Detecting Gradual Transitions --- p.129Chapter 5.4 --- Local Thresholding --- p.131Chapter 5.5 --- Experimental Results --- p.134Chapter 5.5.1 --- Performance of CDH+DD and CDH+DL --- p.135Chapter 5.5.2 --- Performance of DD on other features --- p.144Chapter 5.6 --- Conclusion --- p.150Chapter 6 --- Motion Vector Based Approach --- p.151Chapter 6.1 --- Introduction --- p.151Chapter 6.2 --- Previous Approaches --- p.152Chapter 6.3 --- MPEG-I Video Stream Format --- p.153Chapter 6.4 --- Derivation of Frame Differences from Motion Vector Counts --- p.156Chapter 6.4.1 --- Types of Frame Pairs --- p.156Chapter 6.4.2 --- Conditions for Scene Changes --- p.157Chapter 6.4.3 --- Frame Difference Measure --- p.159Chapter 6.5 --- Experiment --- p.160Chapter 6.5.1 --- Performance of MV --- p.161Chapter 6.5.2 --- Performance Enhancement --- p.162Chapter 6.5.3 --- Limitations --- p.163Chapter 6.6 --- Conclusion --- p.164Chapter 7 --- Conclusion and Future Work --- p.165Chapter 7.1 --- Contributions --- p.165Chapter 7.2 --- Future Work --- p.169Chapter 7.3 --- Conclusion --- p.171Bibliography --- p.174Chapter A --- Sample Videos --- p.180Chapter B --- List of Abbreviations --- p.18
    corecore