5 research outputs found

    Design and Optimization of Graph Transform for Image and Video Compression

    Get PDF
    The main contribution of this thesis is the introduction of new methods for designing adaptive transforms for image and video compression. Exploiting graph signal processing techniques, we develop new graph construction methods targeted for image and video compression applications. In this way, we obtain a graph that is, at the same time, a good representation of the image and easy to transmit to the decoder. To do so, we investigate different research directions. First, we propose a new method for graph construction that employs innovative edge metrics, quantization and edge prediction techniques. Then, we propose to use a graph learning approach and we introduce a new graph learning algorithm targeted for image compression that defines the connectivities between pixels by taking into consideration the coding of the image signal and the graph topology in rate-distortion term. Moreover, we also present a new superpixel-driven graph transform that uses clusters of superpixel as coding blocks and then computes the graph transform inside each region. In the second part of this work, we exploit graphs to design directional transforms. In fact, an efficient representation of the image directional information is extremely important in order to obtain high performance image and video coding. In this thesis, we present a new directional transform, called Steerable Discrete Cosine Transform (SDCT). This new transform can be obtained by steering the 2D-DCT basis in any chosen direction. Moreover, we can also use more complex steering patterns than a single pure rotation. In order to show the advantages of the SDCT, we present a few image and video compression methods based on this new directional transform. The obtained results show that the SDCT can be efficiently applied to image and video compression and it outperforms the classical DCT and other directional transforms. Along the same lines, we present also a new generalization of the DFT, called Steerable DFT (SDFT). Differently from the SDCT, the SDFT can be defined in one or two dimensions. The 1D-SDFT represents a rotation in the complex plane, instead the 2D-SDFT performs a rotation in the 2D Euclidean space

    Transforms for prediction residuals in video coding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 135-140).Typically the same transform, the 2-D Discrete Cosine Transform (DCT), is used to compress both image intensities in image coding and prediction residuals in video coding. Major prediction residuals include the motion compensated prediction residual, the resolution enhancement residual in scalable video coding, and the intra prediction residual in intra-frame coding. The 2-D DCT is efficient at decorrelating images, but the spatial characteristics of prediction residuals can be significantly different from the spatial characteristics of images, and developing transforms that are adapted to the characteristics of prediction residuals can improve their compression efficiency. In this thesis, we explore the differences between the characteristics of images and prediction residuals by analyzing their local anisotropic characteristics and develop transforms adapted to the local anisotropic characteristics of some types of prediction residuals. The analysis shows that local regions in images have 2-D anisotropic characteristics and many regions in several types of prediction residuals have 1-D anisotropic characteristics. Based on this insight, we develop 1-D transforms for these residuals. We perform experiments to evaluate the potential gains achievable from using these transforms within the H.264 codec, and the experimental results indicate that these transforms can increase the compression efficiency of these residuals.by Fatih Kamışlı.Ph.D

    Power consumption reduction techniques for H.264 video compression hardware

    Get PDF
    Video compression systems are used in many commercial products such as digital camcorders, cellular phones and video teleconferencing systems. H.264 / MPEG4 Part 10, the recently developed international standard for video compression, offers significantly better compression efficiency than previous video compression standards. However, this compression efficiency comes with an increase in encoding complexity and therefore in power consumption. Since portable devices operate with battery, it is important to reduce power consumption so that battery life can be increased. In addition, consuming excessive power degrades the performance of integrated circuits, increases packaging and cooling costs, reduces reliability and may cause device failures. In this thesis, we propose novel computational complexity and power reduction techniques for intra prediction, deblocking filter (DBF), and intra mode decision modules of an H.264 video encoder hardware, and intra prediction with template matching (TM) hardware. We quantified the computation reductions achieved by these techniques using H.264 Joint Model reference software encoder. We designed efficient hardware architectures for these video compression algorithms and implemented them in Verilog HDL. We mapped these hardware implementations to Xilinx Virtex FPGAs and estimated their power consumptions using Xilinx XPower Analyzer tool. We integrated the proposed techniques to these hardware implementations and quantified their impact on the power consumptions of these hardware implementations on Xilinx Virtex FPGAs. The proposed techniques significantly reduced the power consumptions of these FPGA implementations in some cases with no PSNR loss and in some cases with very small PSNR loss

    Efficient Motion Estimation and Mode Decision Algorithms for Advanced Video Coding

    Get PDF
    H.264/AVC video compression standard achieved significant improvements in coding efficiency, but the computational complexity of the H.264/AVC encoder is drastically high. The main complexity of encoder comes from variable block size motion estimation (ME) and rate-distortion optimized (RDO) mode decision methods. This dissertation proposes three different methods to reduce computation of motion estimation. Firstly, the computation of each distortion measure is reduced by proposing a novel two step edge based partial distortion search (TS-EPDS) algorithm. In this algorithm, the entire macroblock is divided into different sub-blocks and the calculation order of partial distortion is determined based on the edge strength of the sub-blocks. Secondly, we have developed an early termination algorithm that features an adaptive threshold based on the statistical characteristics of rate-distortion (RD) cost regarding current block and previously processed blocks and modes. Thirdly, this dissertation presents a novel adaptive search area selection method by utilizing the information of the previously computed motion vector differences (MVDs). In H.264/AVC intra coding, DC mode is used to predict regions with no unified direction and the predicted pixel values are same and thus smooth varying regions are not well de-correlated. This dissertation proposes an improved DC prediction (IDCP) mode based on the distance between the predicted and reference pixels. On the other hand, using the nine prediction modes in intra 4x4 and 8x8 block units needs a lot of overhead bits. In order to reduce the number of overhead bits, an intra mode bit rate reduction method is suggested. This dissertation also proposes an enhanced algorithm to estimate the most probable mode (MPM) of each block. The MPM is derived from the prediction mode direction of neighboring blocks which have different weights according to their positions. This dissertation also suggests a fast enhanced cost function for mode decision of intra encoder. The enhanced cost function uses sum of absolute Hadamard-transformed differences (SATD) and mean absolute deviation of the residual block to estimate distortion part of the cost function. A threshold based large coefficients count is also used for estimating the bit-rate part
    corecore