131 research outputs found

    Surveillance centric coding

    Get PDF
    PhDThe research work presented in this thesis focuses on the development of techniques specific to surveillance videos for efficient video compression with higher processing speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher compression efficiency. The framework of SVC is modified to support Surveillance Centric Coding (SCC). Motion estimation techniques specific to surveillance videos are proposed in order to speed up the compression process of the SCC. The main contributions of the research work presented in this thesis are divided into two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims to achieve bit-rate optimisation and adaptation of surveillance videos for storing and transmission purposes. In the proposed approach the SCC encoder communicates with the Video Content Analysis (VCA) module that detects events of interest in video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by exploiting the scalability properties of the employed codec. Time segments containing events relevant to surveillance application are encoded using high spatiotemporal resolution and quality while the irrelevant portions from the surveillance standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is possible; for instance for the transmission purposes. Experimental evaluation showed that significant reduction in bit-rate can be achieved by the proposed approach without loss of information relevant to surveillance applications. In addition to more optimal compression strategy, novel approaches to performing efficient motion estimation specific to surveillance videos are proposed and implemented with experimental results. A real-time background subtractor is used to detect the presence of any motion activity in the sequence. Different approaches for selective motion estimation, GOP based, Frame based and Block based, are implemented. In the former, motion estimation is performed for the whole group of pictures (GOP) only when a moving object is detected for any frame of the GOP. iii While for the Frame based approach; each frame is tested for the motion activity and consequently for selective motion estimation. The selective motion estimation approach is further explored at a lower level as Block based selective motion estimation. Experimental evaluation showed that significant reduction in computational complexity can be achieved by applying the proposed strategy. In addition to selective motion estimation, a tracker based motion estimation and fast full search using multiple reference frames has been proposed for the surveillance videos. Extensive testing on different surveillance videos shows benefits of application of proposed approaches to achieve the goals of the SCC

    Low-Complexity Saliency Detection Algorithm for Fast Perceptual Video Coding

    Get PDF
    A low-complexity saliency detection algorithm for perceptual video coding is proposed; low-level encoding information is adopted as the characteristics of visual perception analysis. Firstly, this algorithm employs motion vector (MV) to extract temporal saliency region through fast MV noise filtering and translational MV checking procedure. Secondly, spatial saliency region is detected based on optimal prediction mode distributions in I-frame and P-frame. Then, it combines the spatiotemporal saliency detection results to define the video region of interest (VROI). The simulation results validate that the proposed algorithm can avoid a large amount of computation work in the visual perception characteristics analysis processing compared with other existing algorithms; it also has better performance in saliency detection for videos and can realize fast saliency detection. It can be used as a part of the video standard codec at medium-to-low bit-rates or combined with other algorithms in fast video coding

    Reconfigurable Architecture For H.264/avc Variable Block Size Motion Estimation Based On Motion Activity And Adaptive Search Range

    Get PDF
    Motion Estimation (ME) technique plays a key role in the video coding systems to achieve high compression ratios by removing temporal redundancies among video frames. Especially in the newest H.264/AVC video coding standard, ME engine demands large amount of computational capabilities due to its support for wide range of different block sizes for a given macroblock in order to increase accuracy in finding best matching block in the previous frames. We propose scalable architecture for H.264/AVC Variable Block Size (VBS) Motion Estimation with adaptive computing capability to support various search ranges, input video resolutions, and frame rates. Hardware architecture of the proposed ME consists of scalable Sum of Absolute Difference (SAD) arrays which can perform Full Search Block Matching Algorithm (FSBMA) for smaller 4x4 blocks. It is also shown that by predicting motion activity and adaptively adjusting the Search Range (SR) on the reconfigurable hardware platform, the computational cost of ME required for inter-frame encoding in H.264/AVC video coding standard can be reduced significantly. Dynamic Partial Reconfiguration is a unique feature of Field Programmable Gate Arrays (FPGAs) that makes best use of hardware resources and power by allowing adaptive algorithm to be implemented during run-time. We exploit this feature of FPGA to implement the proposed reconfigurable architecture of ME and maximize the architectural benefits through prediction of motion activities in the video sequences ,adaptation of SR during run-time, and fractional ME refinement. The implemented ME architecture can support real time applications at a maximum frequency of 90MHz with multiple reconfigurable regions. iv When compared to reconfiguration of complete design, partial reconfiguration process results in smaller bitstream size which allows FPGA to implement different configurations at higher speed. The proposed architecture has modular structure, regular data flow, and efficient memory organization with lower memory accesses. By increasing the number of active partial reconfigurable modules from one to four, there is a 4 fold increase in data re-use. Also, by introducing adaptive SR reduction algorithm at frame level, the computational load of ME is reduced significantly with only small degradation in PSNR (≤0.1dB)

    Robust and fast selective encryption for HEVC videos

    Get PDF
    Emerging High efficiency video coding (HEVC) is expected to be widely adopted in network applications for high definition devices and mobile terminals. Thus, construction of HEVC's encryption schemes that maintain format compliance and bit rate of encrypted bitstream becomes an active security's researches area. This paper presents a novel selective encryption technique for HEVC videos, based on enciphering the bins of selected Golomb–Rice code’s suffixes with the Advanced Encryption Standard (AES) in a CBC operating mode. The scheme preserves format compliance and size of the encrypted HEVC bitstream, and provides high visual degradation with optimized encryption space defined by selected Golomb–Rice suffixes. Experimental results show reliability and robustness of the proposed technique

    Modified inter prediction H.264 video encoding for maritime surveillance

    Get PDF
    Video compression has evolved since it is first being standardized. The most popular CODEC, H.264 can compress video effectively according to the quality that is required. This is due to the motion estimation (ME) process that has impressive features like variable block sizes varying from 4×4 to 16×16 and quarter pixel motion compensation. However, the disadvantage of H.264 is that, it is very complex and impractical for hardware implementation. Many efforts have been made to produce low complexity encoding by compromising on the bitrate and decoded quality. Two notable methods are Fast Search Mode and Early Termination. In Early Termination concept, the encoder does not have to perform ME on every macroblock for every block size. If certain criteria are reached, the process could be terminated and the Mode Decision could select the best block size much faster. This project proposes on using background subtraction to maximize the Early Termination process. When recording using static camera, the background remains the same for a long period of time where most macroblocks will produce minimum residual. Thus in this thesis, the ME process for the background macroblock is terminated much earlier using the maximum 16×16 macroblock size. The accuracy of the background segmentation for maritime surveillance video case study is 88.43% and the true foreground rate is at 41.74%. The proposed encoder manages to reduce 73.5% of the encoding time and 80.5% of the encoder complexity. The bitrate of the output is also reduced, in the range of 20%, compared to the H.264 baseline encoder. The results show that the proposed method achieves the objectives of improving the compression rate and the encoding time

    Schémas de tatouage d'images, schémas de tatouage conjoint à la compression, et schémas de dissimulation de données

    Get PDF
    In this manuscript we address data-hiding in images and videos. Specifically we address robust watermarking for images, robust watermarking jointly with compression, and finally non robust data-hiding.The first part of the manuscript deals with high-rate robust watermarking. After having briefly recalled the concept of informed watermarking, we study the two major watermarking families : trellis-based watermarking and quantized-based watermarking. We propose, firstly to reduce the computational complexity of the trellis-based watermarking, with a rotation based embedding, and secondly to introduce a trellis-based quantization in a watermarking system based on quantization.The second part of the manuscript addresses the problem of watermarking jointly with a JPEG2000 compression step or an H.264 compression step. The quantization step and the watermarking step are achieved simultaneously, so that these two steps do not fight against each other. Watermarking in JPEG2000 is achieved by using the trellis quantization from the part 2 of the standard. Watermarking in H.264 is performed on the fly, after the quantization stage, choosing the best prediction through the process of rate-distortion optimization. We also propose to integrate a Tardos code to build an application for traitors tracing.The last part of the manuscript describes the different mechanisms of color hiding in a grayscale image. We propose two approaches based on hiding a color palette in its index image. The first approach relies on the optimization of an energetic function to get a decomposition of the color image allowing an easy embedding. The second approach consists in quickly obtaining a color palette of larger size and then in embedding it in a reversible way.Dans ce manuscrit nous abordons l’insertion de données dans les images et les vidéos. Plus particulièrement nous traitons du tatouage robuste dans les images, du tatouage robuste conjointement à la compression et enfin de l’insertion de données (non robuste).La première partie du manuscrit traite du tatouage robuste à haute capacité. Après avoir brièvement rappelé le concept de tatouage informé, nous étudions les deux principales familles de tatouage : le tatouage basé treillis et le tatouage basé quantification. Nous proposons d’une part de réduire la complexité calculatoire du tatouage basé treillis par une approche d’insertion par rotation, ainsi que d’autre part d’introduire une approche par quantification basée treillis au seind’un système de tatouage basé quantification.La deuxième partie du manuscrit aborde la problématique de tatouage conjointement à la phase de compression par JPEG2000 ou par H.264. L’idée consiste à faire en même temps l’étape de quantification et l’étape de tatouage, de sorte que ces deux étapes ne « luttent pas » l’une contre l’autre. Le tatouage au sein de JPEG2000 est effectué en détournant l’utilisation de la quantification basée treillis de la partie 2 du standard. Le tatouage au sein de H.264 est effectué à la volée, après la phase de quantification, en choisissant la meilleure prédiction via le processus d’optimisation débit-distorsion. Nous proposons également d’intégrer un code de Tardos pour construire une application de traçage de traîtres.La dernière partie du manuscrit décrit les différents mécanismes de dissimulation d’une information couleur au sein d’une image en niveaux de gris. Nous proposons deux approches reposant sur la dissimulation d’une palette couleur dans son image d’index. La première approche consiste à modéliser le problème puis à l’optimiser afin d’avoir une bonne décomposition de l’image couleur ainsi qu’une insertion aisée. La seconde approche consiste à obtenir, de manière rapide et sûre, une palette de plus grande dimension puis à l’insérer de manière réversible

    Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity

    Get PDF
    When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data

    Variable Block Size Motion Compensation In The Redundant Wavelet Domain

    Get PDF
    Video is one of the most powerful forms of multimedia because of the extensive information it delivers. Video sequences are highly correlated both temporally and spatially, a fact which makes the compression of video possible. Modern video systems employ motion estimation and motion compensation (ME/MC) to de-correlate a video sequence temporally. ME/MC forms a prediction of the current frame using the frames which have been already encoded. Consequently, one needs to transmit the corresponding residual image instead of the original frame, as well as a set of motion vectors which describe the scene motion as observed at the encoder. The redundant wavelet transform (RDWT) provides several advantages over the conventional wavelet transform (DWT). The RDWT overcomes the shift invariant problem in DWT. Moreover, RDWT retains all the phase information of wavelet coefficients and provides multiple prediction possibilities for ME/MC in wavelet domain. The general idea of variable size block motion compensation (VSBMC) technique is to partition a frame in such a way that regions with uniform translational motions are divided into larger blocks while those containing complicated motions into smaller blocks, leading to an adaptive distribution of motion vectors (MV) across the frame. The research proposed new adaptive partitioning schemes and decision criteria in RDWT that utilize more effectively the motion content of a frame in terms of various block sizes. The research also proposed a selective subpixel accuracy algorithm for the motion vector using a multiband approach. The selective subpixel accuracy reduces the computations produced by the conventional subpixel algorithm while maintaining the same accuracy. In addition, the method of overlapped block motion compensation (OBMC) is used to reduce blocking artifacts. Finally, the research extends the applications of the proposed VSBMC to the 3D video sequences. The experimental results obtained here have shown that VSBMC in the RDWT domain can be a powerful tool for video compression
    corecore