3,827 research outputs found

    Surveillance centric coding

    Get PDF
    PhDThe research work presented in this thesis focuses on the development of techniques specific to surveillance videos for efficient video compression with higher processing speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher compression efficiency. The framework of SVC is modified to support Surveillance Centric Coding (SCC). Motion estimation techniques specific to surveillance videos are proposed in order to speed up the compression process of the SCC. The main contributions of the research work presented in this thesis are divided into two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims to achieve bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the SCC encoder communicates with the Video Content Analysis (VCA) module that detects events of interest in video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by exploiting the scalability properties of the employed codec. Time segments containing events relevant to surveillance application are encoded using high spatiotemporal resolution and quality while the irrelevant portions from the surveillance standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is possible; for instance for the transmission purposes. Experimental evaluation showed that significant reduction in bit-rate can be achieved by the proposed approach without loss of information relevant to surveillance applications. In addition to more optimal compression strategy, novel approaches to performing efficient motion estimation specific to surveillance videos are proposed and implemented with experimental results. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Different approaches for selective motion estimation, GOP based, Frame based and Block based, are implemented. In the former, motion estimation is performed for the whole group of pictures (GOP) only when a moving object is detected for any frame of the GOP. iii While for the Frame based approach; each frame is tested for the motion activity and consequently for selective motion estimation. The selective motion estimation approach is further explored at a lower level as Block based selective motion estimation. Experimental evaluation showed that significant reduction in computational complexity can be achieved by applying the proposed strategy. In addition to selective motion estimation, a tracker based motion estimation and fast full search using multiple reference frames has been proposed for the surveillance videos. Extensive testing on different surveillance videos shows benefits of application of proposed approaches to achieve the goals of the SCC

    Low-Complexity Driving Event Detection from Side Information of a 3D Video Encoder

    Get PDF
    Mobile phones are often found in cars, for instance when they are used as navigation assistants. This work propose to use their camera, which is often already pointed to the road, to perform some low-complexity analysis of the driving context, with the final aim to detect potentially unsafe conditions. Since content understanding algorithms are typically too complex to run in real time on a mobile device, a driving event detection algorithm is presented based on the side information available from video encoders, which are a highly optimized application in mobile phones. A set of interesting and easy-to-extract features has been identified in the side information and then further reduced and adapted to the specific events of interest. A detection algorithm based on support vector machines has been designed and trained on several hours of video annotated by a human operator to extract the events of interest. The detection algorithm is shown to achieve a good identification rate for the considered events and feature sets. Moreover, results also show that the use of a stereoscopic camera significantly improves the performance of the detection algorithm in most case

    Digital rights management techniques for H.264 video

    Get PDF
    This work aims to present a number of low-complexity digital rights management (DRM) methodologies for the H.264 standard. Initially, requirements to enforce DRM are analyzed and understood. Based on these requirements, a framework is constructed which puts forth different possibilities that can be explored to satisfy the objective. To implement computationally efficient DRM methods, watermarking and content based copy detection are then chosen as the preferred methodologies. The first approach is based on robust watermarking which modifies the DC residuals of 4×4 macroblocks within I-frames. Robust watermarks are appropriate for content protection and proving ownership. Experimental results show that the technique exhibits encouraging rate-distortion (R-D) characteristics while at the same time being computationally efficient. The problem of content authentication is addressed with the help of two methodologies: irreversible and reversible watermarks. The first approach utilizes the highest frequency coefficient within 4×4 blocks of the I-frames after CAVLC en- tropy encoding to embed a watermark. The technique was found to be very effect- ive in detecting tampering. The second approach applies the difference expansion (DE) method on IPCM macroblocks within P-frames to embed a high-capacity reversible watermark. Experiments prove the technique to be not only fragile and reversible but also exhibiting minimal variation in its R-D characteristics. The final methodology adopted to enforce DRM for H.264 video is based on the concept of signature generation and matching. Specific types of macroblocks within each predefined region of an I-, B- and P-frame are counted at regular intervals in a video clip and an ordinal matrix is constructed based on their count. The matrix is considered to be the signature of that video clip and is matched with longer video sequences to detect copies within them. Simulation results show that the matching methodology is capable of not only detecting copies but also its location within a longer video sequence. Performance analysis depict acceptable false positive and false negative rates and encouraging receiver operating charac- teristics. Finally, the time taken to match and locate copies is significantly low which makes it ideal for use in broadcast and streaming applications

    Video indexing and summarization using motion activity

    Get PDF
    In this dissertation, video-indexing techniques using low-level motion activity characteristics and their application to video summarization are presented. The MPEG-7 motion activity feature is defined as the subjective level of activity or motion in a video segment. First, a novel psychophysical and analytical framework for automatic measurement of motion activity in compliance with its subjective perception is developed. A psychophysically sound subjective ground truth for motion activity and a test-set of video clips is constructed for this purpose. A number of low-level, compressed domain motion vector based, known and novel descriptors are then described. It is shown that these descriptors successfully estimate the subjective level of motion activity of video clips. Furthermore, the individual strengths and limitations of the proposed descriptors are determined using a novel pair wise comparison framework. It is verified that the intensity of motion activity descriptor of the MPEG-7 standard is one of the best performers, while a novel descriptor proposed in this dissertation performs comparably or better. A new descriptor for the spatial distribution of motion activity in a scene is proposed. This descriptor is supplementary to the intensity of motion activity descriptor. The new descriptor is shown to have comparable query retrieval performance to the current spatial distribution of motion activity descriptor of the MPEG-7 standard. The insights obtained from the motion activity investigation are applied to video summarization. A novel approach to summarizing and skimming through video using motion activity is presented. The approach is based on allocation of playback time to video segments proportional to the motion activity of the segments. Low activity segments are played faster than high activity segments in such a way that a constant level of activity is maintained throughout the video. Since motion activity is a low-complexity descriptor, the proposed summarization techniques are extremely fast. The summarization techniques are successfully used on surveillance video, The proposed techniques can also be used as a preprocessing stage for more complex summarization and content analysis techniques, thus providing significant cost gains

    A Joint Learning Approach to Face Detection in Wavelet Compressed Domain

    Get PDF
    Face detection has been an important and active research topic in computer vision and image processing. In recent years, learning-based face detection algorithms have prevailed with successful applications. In this paper, we propose a new face detection algorithm that works directly in wavelet compressed domain. In order to simplify the processes of image decompression and feature extraction, we modify the AdaBoost learning algorithm to select a set of complimentary joint-coefficient classifiers and integrate them to achieve optimal face detection. Since the face detection on the wavelet compression domain is restricted by the limited discrimination power of the designated feature space, the proposed learning mechanism is developed to achieve the best discrimination from the restricted feature space. The major contributions in the proposed AdaBoost face detection learning algorithm contain the feature space warping, joint feature representation, ID3-like plane quantization, and weak probabilistic classifier, which dramatically increase the discrimination power of the face classifier. Experimental results on the CBCL benchmark and the MIT + CMU real image dataset show that the proposed algorithm can detect faces in the wavelet compressed domain accurately and efficiently

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example
    corecore