131 research outputs found
Surveillance centric coding
PhDThe research work presented in this thesis focuses on the development of techniques
specific to surveillance videos for efficient video compression with higher processing
speed. The Scalable Video Coding (SVC) techniques are explored to achieve higher
compression efficiency. The framework of SVC is modified to support Surveillance
Centric Coding (SCC). Motion estimation techniques specific to surveillance videos
are proposed in order to speed up the compression process of the SCC.
The main contributions of the research work presented in this thesis are divided into
two groups (i) Efficient Compression and (ii) Efficient Motion Estimation. The
paradigm of Surveillance Centric Coding (SCC) is introduced, in which coding aims
to achieve bit-rate optimisation and adaptation of surveillance videos for storing and
transmission purposes. In the proposed approach the SCC encoder communicates
with the Video Content Analysis (VCA) module that detects events of interest in
video captured by the CCTV. Bit-rate optimisation and adaptation are achieved by
exploiting the scalability properties of the employed codec. Time segments
containing events relevant to surveillance application are encoded using high spatiotemporal
resolution and quality while the irrelevant portions from the surveillance
standpoint are encoded at low spatio-temporal resolution and / or quality. Thanks to
the scalability of the resulting compressed bit-stream, additional bit-rate adaptation is
possible; for instance for the transmission purposes. Experimental evaluation showed
that significant reduction in bit-rate can be achieved by the proposed approach
without loss of information relevant to surveillance applications.
In addition to more optimal compression strategy, novel approaches to performing
efficient motion estimation specific to surveillance videos are proposed and
implemented with experimental results. A real-time background subtractor is used to
detect the presence of any motion activity in the sequence. Different approaches for
selective motion estimation, GOP based, Frame based and Block based, are
implemented. In the former, motion estimation is performed for the whole group of
pictures (GOP) only when a moving object is detected for any frame of the GOP.
iii
While for the Frame based approach; each frame is tested for the motion activity and
consequently for selective motion estimation. The selective motion estimation
approach is further explored at a lower level as Block based selective motion
estimation. Experimental evaluation showed that significant reduction in
computational complexity can be achieved by applying the proposed strategy. In
addition to selective motion estimation, a tracker based motion estimation and fast
full search using multiple reference frames has been proposed for the surveillance
videos.
Extensive testing on different surveillance videos shows benefits of
application of proposed approaches to achieve the goals of the SCC
Low-Complexity Saliency Detection Algorithm for Fast Perceptual Video Coding
A low-complexity saliency detection algorithm for perceptual video coding is proposed; low-level encoding information is adopted as the characteristics of visual perception analysis. Firstly, this algorithm employs motion vector (MV) to extract temporal saliency region through fast MV noise filtering and translational MV checking procedure. Secondly, spatial saliency region is detected based on optimal prediction mode distributions in I-frame and P-frame. Then, it combines the spatiotemporal saliency detection results to define the video region of interest (VROI). The simulation results validate that the proposed algorithm can avoid a large amount of computation work in the visual perception characteristics analysis processing compared with other existing algorithms; it also has better performance in saliency detection for videos and can realize fast saliency detection. It can be used as a part of the video standard codec at medium-to-low bit-rates or combined with other algorithms in fast video coding
Reconfigurable Architecture For H.264/avc Variable Block Size Motion Estimation Based On Motion Activity And Adaptive Search Range
Motion Estimation (ME) technique plays a key role in the video coding systems to achieve high compression ratios by removing temporal redundancies among video frames. Especially in the newest H.264/AVC video coding standard, ME engine demands large amount of computational capabilities due to its support for wide range of different block sizes for a given macroblock in order to increase accuracy in finding best matching block in the previous frames. We propose scalable architecture for H.264/AVC Variable Block Size (VBS) Motion Estimation with adaptive computing capability to support various search ranges, input video resolutions, and frame rates. Hardware architecture of the proposed ME consists of scalable Sum of Absolute Difference (SAD) arrays which can perform Full Search Block Matching Algorithm (FSBMA) for smaller 4x4 blocks. It is also shown that by predicting motion activity and adaptively adjusting the Search Range (SR) on the reconfigurable hardware platform, the computational cost of ME required for inter-frame encoding in H.264/AVC video coding standard can be reduced significantly. Dynamic Partial Reconfiguration is a unique feature of Field Programmable Gate Arrays (FPGAs) that makes best use of hardware resources and power by allowing adaptive algorithm to be implemented during run-time. We exploit this feature of FPGA to implement the proposed reconfigurable architecture of ME and maximize the architectural benefits through prediction of motion activities in the video sequences ,adaptation of SR during run-time, and fractional ME refinement. The implemented ME architecture can support real time applications at a maximum frequency of 90MHz with multiple reconfigurable regions. iv When compared to reconfiguration of complete design, partial reconfiguration process results in smaller bitstream size which allows FPGA to implement different configurations at higher speed. The proposed architecture has modular structure, regular data flow, and efficient memory organization with lower memory accesses. By increasing the number of active partial reconfigurable modules from one to four, there is a 4 fold increase in data re-use. Also, by introducing adaptive SR reduction algorithm at frame level, the computational load of ME is reduced significantly with only small degradation in PSNR (≤0.1dB)
Robust and fast selective encryption for HEVC videos
Emerging High efficiency video coding (HEVC) is expected to be widely adopted in network applications for high definition devices and mobile terminals. Thus, construction of HEVC's encryption schemes that maintain format compliance and bit rate of encrypted bitstream becomes an active security's researches area. This paper presents a novel selective encryption technique for HEVC videos, based on enciphering the bins of selected Golomb–Rice code’s suffixes with the Advanced Encryption Standard (AES) in a CBC operating mode. The scheme preserves format compliance and size of the encrypted HEVC bitstream, and provides high visual degradation with optimized encryption space defined by selected Golomb–Rice suffixes. Experimental results show reliability and robustness of the proposed technique
Modified inter prediction H.264 video encoding for maritime surveillance
Video compression has evolved since it is first being standardized. The most popular CODEC, H.264 can compress video effectively according to the quality that is required. This is due to the motion estimation (ME) process that has impressive features like variable block sizes varying from 4×4 to 16×16 and quarter pixel motion compensation. However, the disadvantage of H.264 is that, it is very complex and impractical for hardware implementation. Many efforts have been made to produce low complexity encoding by compromising on the bitrate and decoded quality. Two notable methods are Fast Search Mode and Early Termination. In Early Termination concept, the encoder does not have to perform ME on every macroblock for every block size. If certain criteria are reached, the process could be terminated and the Mode Decision could select the best block size much faster. This project proposes on using background subtraction to maximize the Early Termination process. When recording using static camera, the background remains the same for a long period of time where most macroblocks will produce minimum residual. Thus in this thesis, the ME process for the background macroblock is terminated much earlier using the maximum 16×16 macroblock size. The accuracy of the background segmentation for maritime surveillance video case study is 88.43% and the true foreground rate is at 41.74%. The proposed encoder manages to reduce 73.5% of the encoding time and 80.5% of the encoder complexity. The bitrate of the output is also reduced, in the range of 20%, compared to the H.264 baseline encoder. The results show that the proposed method achieves the objectives of improving the compression rate and the encoding time
Schémas de tatouage d'images, schémas de tatouage conjoint à la compression, et schémas de dissimulation de données
In this manuscript we address data-hiding in images and videos. Specifically we address robust watermarking for images, robust watermarking jointly with compression, and finally non robust data-hiding.The first part of the manuscript deals with high-rate robust watermarking. After having briefly recalled the concept of informed watermarking, we study the two major watermarking families : trellis-based watermarking and quantized-based watermarking. We propose, firstly to reduce the computational complexity of the trellis-based watermarking, with a rotation based embedding, and secondly to introduce a trellis-based quantization in a watermarking system based on quantization.The second part of the manuscript addresses the problem of watermarking jointly with a JPEG2000 compression step or an H.264 compression step. The quantization step and the watermarking step are achieved simultaneously, so that these two steps do not fight against each other. Watermarking in JPEG2000 is achieved by using the trellis quantization from the part 2 of the standard. Watermarking in H.264 is performed on the fly, after the quantization stage, choosing the best prediction through the process of rate-distortion optimization. We also propose to integrate a Tardos code to build an application for traitors tracing.The last part of the manuscript describes the different mechanisms of color hiding in a grayscale image. We propose two approaches based on hiding a color palette in its index image. The first approach relies on the optimization of an energetic function to get a decomposition of the color image allowing an easy embedding. The second approach consists in quickly obtaining a color palette of larger size and then in embedding it in a reversible way.Dans ce manuscrit nous abordons l’insertion de données dans les images et les vidéos. Plus particulièrement nous traitons du tatouage robuste dans les images, du tatouage robuste conjointement à la compression et enfin de l’insertion de données (non robuste).La première partie du manuscrit traite du tatouage robuste à haute capacité. Après avoir brièvement rappelé le concept de tatouage informé, nous étudions les deux principales familles de tatouage : le tatouage basé treillis et le tatouage basé quantification. Nous proposons d’une part de réduire la complexité calculatoire du tatouage basé treillis par une approche d’insertion par rotation, ainsi que d’autre part d’introduire une approche par quantification basée treillis au seind’un système de tatouage basé quantification.La deuxième partie du manuscrit aborde la problématique de tatouage conjointement à la phase de compression par JPEG2000 ou par H.264. L’idée consiste à faire en même temps l’étape de quantification et l’étape de tatouage, de sorte que ces deux étapes ne « luttent pas » l’une contre l’autre. Le tatouage au sein de JPEG2000 est effectué en détournant l’utilisation de la quantification basée treillis de la partie 2 du standard. Le tatouage au sein de H.264 est effectué à la volée, après la phase de quantification, en choisissant la meilleure prédiction via le processus d’optimisation débit-distorsion. Nous proposons également d’intégrer un code de Tardos pour construire une application de traçage de traîtres.La dernière partie du manuscrit décrit les différents mécanismes de dissimulation d’une information couleur au sein d’une image en niveaux de gris. Nous proposons deux approches reposant sur la dissimulation d’une palette couleur dans son image d’index. La première approche consiste à modéliser le problème puis à l’optimiser afin d’avoir une bonne décomposition de l’image couleur ainsi qu’une insertion aisée. La seconde approche consiste à obtenir, de manière rapide et sûre, une palette de plus grande dimension puis à l’insérer de manière réversible
Recommended from our members
Research and developments of Dirac video codec
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.In digital video compression, apart from storage, successful transmission of the compressed video
data over the bandwidth limited erroneous channels is another important issue. To enable a video
codec for broadcasting application, it is required to implement the corresponding coding tools (e.g.
error-resilient coding, rate control etc.). They are normally non-normative parts of a video codec and
hence their specifications are not defined in the standard. In Dirac as well, the original codec is
optimized for storage purpose only and so, several non-normative part of the encoding tools are still
required in order to be able to use in other types of application.
Being the "Research and Developments of the Dirac Video Codec" as the research title, phase I of
the project is mainly focused on the error-resilient transmission over a noisy channel. The error-resilient
coding method used here is a simple and low complex coding scheme which provides the
error-resilient transmission of the compressed video bitstream of Dirac video encoder over the packet
erasure wired network. The scheme combines source and channel coding approach where error-resilient
source coding is achieved by data partitioning in the wavelet transformed domain and
channel coding is achieved through the application of either Rate-Compatible Punctured
Convolutional (RCPC) Code or Turbo Code (TC) using un-equal error protection between header plus
MV and data. The scheme is designed mainly for the packet-erasure channel, i.e. targeted for the
Internet broadcasting application.
But, for a bandwidth limited channel, it is still required to limit the amount of bits generated from
the encoder depending on the available bandwidth in addition to the error-resilient coding. So, in the
2nd phase of the project, a rate control algorithm is presented. The algorithm is based upon the Quality
Factor (QF) optimization method where QF of the encoded video is adaptively changing in order to
achieve average bitrate which is constant over each Group of Picture (GOP). A relation between the
bitrate, R and the QF, which is called Rate-QF (R-QF) model is derived in order to estimate the
optimum QF of the current encoding frame for a given target bitrate, R.
In some applications like video conferencing, real-time encoding and decoding with minimum
delay is crucial, but, the ability to do real-time encoding/decoding is largely determined by the
complexity of the encoder/decoder. As we all know that motion estimation process inside the encoder
is the most time consuming stage. So, reducing the complexity of the motion estimation stage will
certainly give one step closer to the real-time application. So, as a partial contribution toward realtime
application, in the final phase of the research, a fast Motion Estimation (ME) strategy is designed
and implemented. It is the combination of modified adaptive search plus semi-hierarchical way of
motion estimation. The same strategy was implemented in both Dirac and H.264 in order to
investigate its performance on different codecs. Together with this fast ME strategy, a method which
is called partial cost function calculation in order to further reduce down the computational load of the
cost function calculation was presented. The calculation is based upon the pre-defined set of patterns
which were chosen in such a way that they have as much maximum coverage as possible over the
whole block.
In summary, this research work has contributed to the error-resilient transmission of compressed
bitstreams of Dirac video encoder over a bandwidth limited error prone channel. In addition to this,
the final phase of the research has partially contributed toward the real-time application of the Dirac
video codec by implementing a fast motion estimation strategy together with partial cost function
calculation idea.BBC R&D and Brunel University
Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity
When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data
Variable Block Size Motion Compensation In The Redundant Wavelet Domain
Video is one of the most powerful forms of multimedia because of the extensive information it delivers. Video sequences are highly correlated both temporally and spatially, a fact which makes the compression of video possible. Modern video systems employ motion estimation and motion compensation (ME/MC) to de-correlate a video sequence temporally. ME/MC forms a prediction of the current frame using the frames which have been already encoded. Consequently, one needs to transmit the corresponding residual image instead of the original frame, as well as a set of motion vectors which describe the scene motion as observed at the encoder. The redundant wavelet transform (RDWT) provides several advantages over the conventional wavelet transform (DWT). The RDWT overcomes the shift invariant problem in DWT. Moreover, RDWT retains all the phase information of wavelet coefficients and provides multiple prediction possibilities for ME/MC in wavelet domain. The general idea of variable size block motion compensation (VSBMC) technique is to partition a frame in such a way that regions with uniform translational motions are divided into larger blocks while those containing complicated motions into smaller blocks, leading to an adaptive distribution of motion vectors (MV) across the frame. The research proposed new adaptive partitioning schemes and decision criteria in RDWT that utilize more effectively the motion content of a frame in terms of various block sizes. The research also proposed a selective subpixel accuracy algorithm for the motion vector using a multiband approach. The selective subpixel accuracy reduces the computations produced by the conventional subpixel algorithm while maintaining the same accuracy. In addition, the method of overlapped block motion compensation (OBMC) is used to reduce blocking artifacts. Finally, the research extends the applications of the proposed VSBMC to the 3D video sequences. The experimental results obtained here have shown that VSBMC in the RDWT domain can be a powerful tool for video compression
Recommended from our members
Intelligent Side Information Generation in Distributed Video Coding
Distributed video coding (DVC) reverses the traditional coding paradigm of complex encoders allied with basic decoding to one where the computational cost is largely incurred by the decoder. This is attractive as the proven theoretical work of Wyner-Ziv (WZ) and Slepian-Wolf (SW) shows that the performance by such a system should be exactly the same as a conventional coder. Despite the solid theoretical foundations, current DVC qualitative and quantitative performance falls short of existing conventional coders and there remain crucial limitations. A key constraint governing DVC performance is the quality of side information (SI), a coarse representation of original video frames which are not available at the decoder. Techniques to generate SI have usually been based on linear motion compensated temporal interpolation (LMCTI), though these do not always produce satisfactory SI quality, especially in sequences exhibiting non-linear motion.
This thesis presents an intelligent higher order piecewise trajectory temporal interpolation (HOPTTI) framework for SI generation with original contributions that afford better SI quality in comparison to existing LMCTI-based approaches. The major elements in this framework are: (i) a cubic trajectory interpolation algorithm model that significantly improves the accuracy of motion vector estimations; (ii) an adaptive overlapped block motion compensation (AOBMC) model which reduces both blocking and overlapping artefacts in the SI emanating from the block matching algorithm; (iii) the development of an empirical mode switching algorithm; and (iv) an intelligent switching mechanism to construct SI by automatically selecting the best macroblock from the intermediate SI generated by HOPTTI and AOBMC algorithms. Rigorous analysis and evaluation confirms that significant quantitative and perceptual improvements in SI quality are achieved with the new framework
- …