44 research outputs found

    Fast coding strategy for HEVC by motion features and saliency applied on difference between successive image blocks

    Get PDF
    Introducing a number of innovative and powerful coding tools, the High Efficiency Video Coding (HEVC) standard promises double compression efficiency, compared to its predecessor H.264, with similar perceptual quality. The increased computational time complexity is an important issue for the video coding research community as well. An attempt to reduce this complexity of HEVC is adopted in this paper, by efficient selection of appropriate block-partitioning modes based on motion features and the saliency applied to the difference between successive image blocks. As this difference gives us the explicit visible motion and salient information, we develop a cost function by combining the motion features and image difference salient feature. The combined features are then converted into area of interest (AOI) based binary pattern for the current block. This pattern is then compared with a previously defined codebook of binary pattern templates for a subset of mode selection. Motion estimation (ME) and motion compensation (MC) are performed only on the selected subset of modes, without exhaustive exploration of all modes available in HEVC. The experimental results reveal a reduction of 42% encoding time complexity of HEVC encoder with similar subjective and objective image quality

    Efficient video coding using visual sensitive information for HEVC coding standard

    Get PDF
    The latest high efficiency video coding (HEVC) standard introduces a large number of inter-mode block partitioning modes. The HEVC reference test model (HM) uses partially exhaustive tree-structured mode selection, which still explores a large number of prediction unit (PU) modes for a coding unit (CU). This impacts on encoding time rise which deprives a number of electronic devices having limited processing resources to use various features of HEVC. By analyzing the homogeneity, residual, and different statistical correlation among modes, many researchers speed-up the encoding process through the number of PU mode reduction. However, these approaches could not demonstrate the similar rate-distortion (RD) performance with the HM due to their dependency on existing Lagrangian cost function (LCF) within the HEVC framework. In this paper, to avoid the complete dependency on LCF in the initial phase, we exploit visual sensitive foreground motion and spatial salient metric (FMSSM) in a block. To capture its motion and saliency features, we use the dynamic background and visual saliency modeling, respectively. According to the FMSSM values, a subset of PU modes is then explored for encoding the CU. This preprocessing phase is independent from the existing LCF. As the proposed coding technique further reduces the number of PU modes using two simple criteria (i.e., motion and saliency), it outperforms the HM in terms of encoding time reduction. As it also encodes the uncovered and static background areas using the dynamic background frame as a substituted reference frame, it does not sacrifice quality. Tested results reveal that the proposed method achieves 32% average encoding time reduction of the HM without any quality loss for a wide range of videos

    Visual Saliency Estimation Via HEVC Bitstream Analysis

    Get PDF
    Abstract Since Information Technology developed dramatically from the last century 50's, digital images and video are ubiquitous. In the last decade, image and video processing have become more and more popular in biomedical, industrial, art and other fields. People made progress in the visual information such as images or video display, storage and transmission. The attendant problem is that video processing tasks in time domain become particularly arduous. Based on the study of the existing compressed domain video saliency detection model, a new saliency estimation model for video based on High Efficiency Video Coding (HEVC) is presented. First, the relative features are extracted from HEVC encoded bitstream. The naive Bayesian model is used to train and test features based on original YUV videos and ground truth. The intra frame saliency map can be achieved after training and testing intra features. And inter frame saliency can be achieved by intra saliency with moving motion vectors. The ROC of our proposed intra mode is 0.9561. Other classification methods such as support vector machine (SVM), k nearest neighbors (KNN) and the decision tree are presented to compare the experimental outcomes. The variety of compression ratio has been analysis to affect the saliency

    Attention Driven Solutions for Robust Digital Watermarking Within Media

    Get PDF
    As digital technologies have dramatically expanded within the last decade, content recognition now plays a major role within the control of media. Of the current recent systems available, digital watermarking provides a robust maintainable solution to enhance media security. The two main properties of digital watermarking, imperceptibility and robustness, are complimentary to each other but by employing visual attention based mechanisms within the watermarking framework, highly robust watermarking solutions are obtainable while also maintaining high media quality. This thesis firstly provides suitable bottom-up saliency models for raw image and video. The image and video saliency algorithms are estimated directly from within the wavelet domain for enhanced compatibility with the watermarking framework. By combining colour, orientation and intensity contrasts for the image model and globally compensated object motion in the video model, novel wavelet-based visual saliency algorithms are provided. The work extends these saliency models into a unique visual attention-based watermarking scheme by increasing the watermark weighting parameter within visually uninteresting regions. An increased watermark robustness, up to 40%, against various filtering attacks, JPEG2000 and H.264/AVC compression is obtained while maintaining the media quality, verified by various objective and subjective evaluation tools. As most video sequences are stored in an encoded format, this thesis studies watermarking schemes within the compressed domain. Firstly, the work provides a compressed domain saliency model formulated directly within the HEVC codec, utilizing various coding decisions such as block partition size, residual magnitude, intra frame angular prediction mode and motion vector difference magnitude. Large computational savings, of 50% or greater, are obtained compared with existing methodologies, as the saliency maps are generated from partially decoded bitstreams. Finally, the saliency maps formulated within the compressed HEVC domain are studied within the watermarking framework. A joint encoder and a frame domain watermarking scheme are both proposed by embedding data into the quantised transform residual data or wavelet coefficients, respectively, which exhibit low visual salience

    Error resilience and concealment techniques for high-efficiency video coding

    Get PDF
    This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

    Fast and Efficient Foveated Video Compression Schemes for H.264/AVC Platform

    Get PDF
    Some fast and efficient foveated video compression schemes for H.264/AVC platform are presented in this dissertation. The exponential growth in networking technologies and widespread use of video content based multimedia information over internet for mass communication applications like social networking, e-commerce and education have promoted the development of video coding to a great extent. Recently, foveated imaging based image or video compression schemes are in high demand, as they not only match with the perception of human visual system (HVS), but also yield higher compression ratio. The important or salient regions are compressed with higher visual quality while the non-salient regions are compressed with higher compression ratio. From amongst the foveated video compression developments during the last few years, it is observed that saliency detection based foveated schemes are the keen areas of intense research. Keeping this in mind, we propose two multi-scale saliency detection schemes. (1) Multi-scale phase spectrum based saliency detection (FTPBSD); (2) Sign-DCT multi-scale pseudo-phase spectrum based saliency detection (SDCTPBSD). In FTPBSD scheme, a saliency map is determined using phase spectrum of a given image/video with unity magnitude spectrum. On the other hand, the proposed SDCTPBSD method uses sign information of discrete cosine transform (DCT) also known as sign-DCT (SDCT). It resembles the response of receptive field neurons of HVS. A bottom-up spatio-temporal saliency map is obtained by linear weighted sum of spatial saliency map and temporal saliency map. Based on these saliency detection techniques, foveated video compression (FVC) schemes (FVC-FTPBSD and FVC-SDCTPBSD) are developed to improve the compression performance further.Moreover, the 2D-discrete cosine transform (2D-DCT) is widely used in various video coding standards for block based transformation of spatial data. However, for directional featured blocks, 2D-DCT offers sub-optimal performance and may not able to efficiently represent video data with fewer coefficients that deteriorates compression ratio. Various directional transform schemes are proposed in literature for efficiently encoding such directional featured blocks. However, it is observed that these directional transform schemes suffer from many issues like ‘mean weighting defect’, use of a large number of DCTs and a number of scanning patterns. We propose a directional transform scheme based on direction-adaptive fixed length discrete cosine transform (DAFL-DCT) for intra-, and inter-frame to achieve higher coding efficiency in case of directional featured blocks.Furthermore, the proposed DAFL-DCT has the following two encoding modes. (1) Direction-adaptive fixed length ― high efficiency (DAFL-HE) mode for higher compression performance; (2) Direction-adaptive fixed length ― low complexity (DAFL-LC) mode for low complexity with a fair compression ratio. On the other hand, motion estimation (ME) exploits temporal correlation between video frames and yields significant improvement in compression ratio while sustaining high visual quality in video coding. Block-matching motion estimation (BMME) is the most popular approach due to its simplicity and efficiency. However, the real-world video sequences may contain slow, medium and/or fast motion activities. Further, a single search pattern does not prove efficient in finding best matched block for all motion types. In addition, it is observed that most of the BMME schemes are based on uni-modal error surface. Nevertheless, real-world video sequences may exhibit a large number of local minima available within a search window and thus possess multi-modal error surface (MES). Hence, the following two uni-modal error surface based and multi-modal error surface based motion estimation schemes are developed. (1) Direction-adaptive motion estimation (DAME) scheme; (2) Pattern-based modified particle swarm optimization motion estimation (PMPSO-ME) scheme. Subsequently, various fast and efficient foveated video compression schemes are developed with combination of these schemes to improve the video coding performance further while maintaining high visual quality to salient regions. All schemes are incorporated into the H.264/AVC video coding platform. Various experiments have been carried out on H.264/AVC joint model reference software (version JM 18.6). Computing various benchmark metrics, the proposed schemes are compared with other existing competitive schemes in terms of rate-distortion curves, Bjontegaard metrics (BD-PSNR, BD-SSIM and BD-bitrate), encoding time, number of search points and subjective evaluation to derive an overall conclusion

    A computational model of visual attention.

    Get PDF
    Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersâ QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsâ ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersâ preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency
    corecore