63 research outputs found

    Image and Video Coding/Transcoding: A Rate Distortion Approach

    Get PDF
    Due to the lossy nature of image/video compression and the expensive bandwidth and computation resources in a multimedia system, one of the key design issues for image and video coding/transcoding is to optimize trade-off among distortion, rate, and/or complexity. This thesis studies the application of rate distortion (RD) optimization approaches to image and video coding/transcoding for exploring the best RD performance of a video codec compatible to the newest video coding standard H.264 and for designing computationally efficient down-sampling algorithms with high visual fidelity in the discrete Cosine transform (DCT) domain. RD optimization for video coding in this thesis considers two objectives, i.e., to achieve the best encoding efficiency in terms of minimizing the actual RD cost and to maintain decoding compatibility with the newest video coding standard H.264. By the actual RD cost, we mean a cost based on the final reconstruction error and the entire coding rate. Specifically, an operational RD method is proposed based on a soft decision quantization (SDQ) mechanism, which has its root in a fundamental RD theoretic study on fixed-slope lossy data compression. Using SDQ instead of hard decision quantization, we establish a general framework in which motion prediction, quantization, and entropy coding in a hybrid video coding scheme such as H.264 are jointly designed to minimize the actual RD cost on a frame basis. The proposed framework is applicable to optimize any hybrid video coding scheme, provided that specific algorithms are designed corresponding to coding syntaxes of a given standard codec, so as to maintain compatibility with the standard. Corresponding to the baseline profile syntaxes and the main profile syntaxes of H.264, respectively, we have proposed three RD algorithms---a graph-based algorithm for SDQ given motion prediction and quantization step sizes, an algorithm for residual coding optimization given motion prediction, and an iterative overall algorithm for jointly optimizing motion prediction, quantization, and entropy coding---with them embedded in the indicated order. Among the three algorithms, the SDQ design is the core, which is developed based on a given entropy coding method. Specifically, two SDQ algorithms have been developed based on the context adaptive variable length coding (CAVLC) in H.264 baseline profile and the context adaptive binary arithmetic coding (CABAC) in H.264 main profile, respectively. Experimental results for the H.264 baseline codec optimization show that for a set of typical testing sequences, the proposed RD method for H.264 baseline coding achieves a better trade-off between rate and distortion, i.e., 12\% rate reduction on average at the same distortion (ranging from 30dB to 38dB by PSNR) when compared with the RD optimization method implemented in H.264 baseline reference codec. Experimental results for optimizing H.264 main profile coding with CABAC show 10\% rate reduction over a main profile reference codec using CABAC, which also suggests 20\% rate reduction over the RD optimization method implemented in H.264 baseline reference codec, leading to our claim of having developed the best codec in terms of RD performance, while maintaining the compatibility with H.264. By investigating trade-off between distortion and complexity, we have also proposed a designing framework for image/video transcoding with spatial resolution reduction, i.e., to down-sample compressed images/video with an arbitrary ratio in the DCT domain. First, we derive a set of DCT-domain down-sampling methods, which can be represented by a linear transform with double-sided matrix multiplication (LTDS) in the DCT domain. Then, for a pre-selected pixel-domain down-sampling method, we formulate an optimization problem for finding an LTDS to approximate the given pixel-domain method to achieve the best trade-off between visual quality and computational complexity. The problem is then solved by modeling an LTDS with a multi-layer perceptron network and using a structural learning with forgetting algorithm for training the network. Finally, by selecting a pixel-domain reference method with the popular Butterworth lowpass filtering and cubic B-spline interpolation, the proposed framework discovers an LTDS with better visual quality and lower computational complexity when compared with state-of-the-art methods in the literature

    A NOVEL JOINT PERCEPTUAL ENCRYPTION AND WATERMARKING SCHEME (JPEW) WITHIN JPEG FRAMEWORK

    Get PDF
    Due to the rapid growth in internet and multimedia technologies, many new commercial applications like video on demand (VOD), pay-per-view and real-time multimedia broadcast etc, have emerged. To ensure the integrity and confidentiality of the multimedia content, the content is usually watermarked and then encrypted or vice versa. If the multimedia content needs to be watermarked and encrypted at the same time, the watermarking function needs to be performed first followed by encryption function. Hence, if the watermark needs to be extracted then the multimedia data needs to be decrypted first followed by extraction of the watermark. This results in large computational overhead. The solution provided in the literature for this problem is by using what is called partial encryption, in which media data are partitioned into two parts - one to be watermarked and the other is encrypted. In addition, some multimedia applications i.e. video on demand (VOD), Pay-TV, pay-per-view etc, allow multimedia content preview which involves „perceptual‟ encryption wherein all or some selected part of the content is, perceptually speaking, distorted with an encryption key. Up till now no joint perceptual encryption and watermarking scheme has been proposed in the literature. In this thesis, a novel Joint Perceptual Encryption and Watermarking (JPEW) scheme is proposed that is integrated within JPEG standard. The design of JPEW involves the design and development of both perceptual encryption and watermarking schemes that are integrated in JPEG and feasible within the „partial‟ encryption framework. The perceptual encryption scheme exploits the energy distribution of AC components and DC components bitplanes of continuous-tone images and is carried out by selectively encrypting these AC coefficients and DC components bitplanes. The encryption itself is based on a chaos-based permutation reported in an earlier work. Similarly, in contrast to the traditional watermarking schemes, the proposed watermarking scheme makes use of DC component of the image and it is carried out by selectively substituting certain bitplanes of DC components with watermark bits. vi ii Apart from the aforesaid JPEW, additional perceptual encryption scheme, integrated in JPEG, has also been proposed. The scheme is outside of joint framework and implements perceptual encryption on region of interest (ROI) by scrambling the DCT blocks of the chosen ROI. The performances of both, perceptual encryption and watermarking schemes are evaluated and compared with Quantization Index modulation (QIM) based watermarking scheme and reversible Histogram Spreading (RHS) based perceptual encryption scheme. The results show that the proposed watermarking scheme is imperceptible and robust, and suitable for authentication. Similarly, the proposed perceptual encryption scheme outperforms the RHS based scheme in terms of number of operations required to achieve a given level of perceptual encryption and provides control over the amount of perceptual encryption. The overall security of the JPEW has also been evaluated. Additionally, the performance of proposed separate perceptual encryption scheme has been thoroughly evaluated in terms of security and compression efficiency. The scheme is found to be simpler in implementation, have insignificant effect on compression ratios and provide more options for the selection of control factor

    Digital Multimedia Forensics and Anti-Forensics

    Get PDF
    As the use of digital multimedia content such as images and video has increased, so has the means and the incentive to create digital forgeries. Presently, powerful editing software allows forgers to create perceptually convincing digital forgeries. Accordingly, there is a great need for techniques capable of authenticating digital multimedia content. In response to this, researchers have begun developing digital forensic techniques capable of identifying digital forgeries. These forensic techniques operate by detecting imperceptible traces left by editing operations in digital multimedia content. In this dissertation, we propose several new digital forensic techniques to detect evidence of editing in digital multimedia content. We begin by identifying the fingerprints left by pixel value mappings and show how these can be used to detect the use of contrast enhancement in images. We use these fingerprints to perform a number of additional forensic tasks such as identifying cut-and-paste forgeries, detecting the addition of noise to previously JPEG compressed images, and estimating the contrast enhancement mapping used to alter an image. Additionally, we consider the problem of multimedia security from the forger's point of view. We demonstrate that an intelligent forger can design anti-forensic operations to hide editing fingerprints and fool forensic techniques. We propose an anti-forensic technique to remove compression fingerprints from digital images and show that this technique can be used to fool several state-of-the-art forensic algorithms. We examine the problem of detecting frame deletion in digital video and develop both a technique to detect frame deletion and an anti-forensic technique to hide frame deletion fingerprints. We show that this anti-forensic operation leaves behind fingerprints of its own and propose a technique to detect the use of frame deletion anti-forensics. The ability of a forensic investigator to detect both editing and the use of anti-forensics results in a dynamic interplay between the forger and forensic investigator. We use develop a game theoretic framework to analyze this interplay and identify the set of actions that each party will rationally choose. Additionally, we show that anti-forensics can be used protect against reverse engineering. To demonstrate this, we propose an anti-forensic module that can be integrated into digital cameras to protect color interpolation methods

    Cubic-panorama image dataset analysis for storage and transmission

    Full text link

    A NOVEL JOINT PERCEPTUAL ENCRYPTION AND WATERMARKING SCHEME (JPEW) WITHIN JPEG FRAMEWORK

    Get PDF
    Due to the rapid growth in internet and multimedia technologies, many new commercial applications like video on demand (VOD), pay-per-view and real-time multimedia broadcast etc, have emerged. To ensure the integrity and confidentiality of the multimedia content, the content is usually watermarked and then encrypted or vice versa. If the multimedia content needs to be watermarked and encrypted at the same time, the watermarking function needs to be performed first followed by encryption function. Hence, if the watermark needs to be extracted then the multimedia data needs to be decrypted first followed by extraction of the watermark. This results in large computational overhead. The solution provided in the literature for this problem is by using what is called partial encryption, in which media data are partitioned into two parts - one to be watermarked and the other is encrypted. In addition, some multimedia applications i.e. video on demand (VOD), Pay-TV, pay-per-view etc, allow multimedia content preview which involves „perceptual‟ encryption wherein all or some selected part of the content is, perceptually speaking, distorted with an encryption key. Up till now no joint perceptual encryption and watermarking scheme has been proposed in the literature. In this thesis, a novel Joint Perceptual Encryption and Watermarking (JPEW) scheme is proposed that is integrated within JPEG standard. The design of JPEW involves the design and development of both perceptual encryption and watermarking schemes that are integrated in JPEG and feasible within the „partial‟ encryption framework. The perceptual encryption scheme exploits the energy distribution of AC components and DC components bitplanes of continuous-tone images and is carried out by selectively encrypting these AC coefficients and DC components bitplanes. The encryption itself is based on a chaos-based permutation reported in an earlier work. Similarly, in contrast to the traditional watermarking schemes, the proposed watermarking scheme makes use of DC component of the image and it is carried out by selectively substituting certain bitplanes of DC components with watermark bits. vi ii Apart from the aforesaid JPEW, additional perceptual encryption scheme, integrated in JPEG, has also been proposed. The scheme is outside of joint framework and implements perceptual encryption on region of interest (ROI) by scrambling the DCT blocks of the chosen ROI. The performances of both, perceptual encryption and watermarking schemes are evaluated and compared with Quantization Index modulation (QIM) based watermarking scheme and reversible Histogram Spreading (RHS) based perceptual encryption scheme. The results show that the proposed watermarking scheme is imperceptible and robust, and suitable for authentication. Similarly, the proposed perceptual encryption scheme outperforms the RHS based scheme in terms of number of operations required to achieve a given level of perceptual encryption and provides control over the amount of perceptual encryption. The overall security of the JPEW has also been evaluated. Additionally, the performance of proposed separate perceptual encryption scheme has been thoroughly evaluated in terms of security and compression efficiency. The scheme is found to be simpler in implementation, have insignificant effect on compression ratios and provide more options for the selection of control factor

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    Nouvelles méthodes de prédiction inter-images pour la compression d’images et de vidéos

    Get PDF
    Due to the large availability of video cameras and new social media practices, as well as the emergence of cloud services, images and videosconstitute today a significant amount of the total data that is transmitted over the internet. Video streaming applications account for more than 70% of the world internet bandwidth. Whereas billions of images are already stored in the cloud and millions are uploaded every day. The ever growing streaming and storage requirements of these media require the constant improvements of image and video coding tools. This thesis aims at exploring novel approaches for improving current inter-prediction methods. Such methods leverage redundancies between similar frames, and were originally developed in the context of video compression. In a first approach, novel global and local inter-prediction tools are associated to improve the efficiency of image sets compression schemes based on video codecs. By leveraging a global geometric and photometric compensation with a locally linear prediction, significant improvements can be obtained. A second approach is then proposed which introduces a region-based inter-prediction scheme. The proposed method is able to improve the coding performances compared to existing solutions by estimating and compensating geometric and photometric distortions on a semi-local level. This approach is then adapted and validated in the context of video compression. Bit-rate improvements are obtained, especially for sequences displaying complex real-world motions such as zooms and rotations. The last part of the thesis focuses on deep learning approaches for inter-prediction. Deep neural networks have shown striking results for a large number of computer vision tasks over the last years. Deep learning based methods proposed for frame interpolation applications are studied here in the context of video compression. Coding performance improvements over traditional motion estimation and compensation methods highlight the potential of these deep architectures.En raison de la grande disponibilité des dispositifs de capture vidéo et des nouvelles pratiques liées aux réseaux sociaux, ainsi qu’à l’émergence desservices en ligne, les images et les vidéos constituent aujourd’hui une partie importante de données transmises sur internet. Les applications de streaming vidéo représentent ainsi plus de 70% de la bande passante totale de l’internet. Des milliards d’images sont déjà stockées dans le cloud et des millions y sont téléchargés chaque jour. Les besoins toujours croissants en streaming et stockage nécessitent donc une amélioration constante des outils de compression d’image et de vidéo. Cette thèse vise à explorer des nouvelles approches pour améliorer les méthodes actuelles de prédiction inter-images. De telles méthodes tirent parti des redondances entre images similaires, et ont été développées à l’origine dans le contexte de la vidéo compression. Dans une première partie, de nouveaux outils de prédiction inter globaux et locaux sont associés pour améliorer l’efficacité des schémas de compression de bases de données d’image. En associant une compensation géométrique et photométrique globale avec une prédiction linéaire locale, des améliorations significatives peuvent être obtenues. Une seconde approche est ensuite proposée qui introduit un schéma deprédiction inter par régions. La méthode proposée est en mesure d’améliorer les performances de codage par rapport aux solutions existantes en estimant et en compensant les distorsions géométriques et photométriques à une échelle semi locale. Cette approche est ensuite adaptée et validée dans le cadre de la compression vidéo. Des améliorations en réduction de débit sont obtenues, en particulier pour les séquences présentant des mouvements complexes réels tels que des zooms et des rotations. La dernière partie de la thèse se concentre sur l’étude des méthodes d’apprentissage en profondeur dans le cadre de la prédiction inter. Ces dernières années, les réseaux de neurones profonds ont obtenu des résultats impressionnants pour un grand nombre de tâches de vision par ordinateur. Les méthodes basées sur l’apprentissage en profondeur proposéesà l’origine pour de l’interpolation d’images sont étudiées ici dans le contexte de la compression vidéo. Des améliorations en terme de performances de codage sont obtenues par rapport aux méthodes d’estimation et de compensation de mouvements traditionnelles. Ces résultats mettent en évidence le fort potentiel de ces architectures profondes dans le domaine de la compression vidéo
    • …
    corecore