236 research outputs found

    Mathematical transforms and image compression: A review

    Get PDF
    It is well known that images, often used in a variety of computer and other scientific and engineering applications, are difficult to store and transmit due to their sizes. One possible solution to overcome this problem is to use an efficient digital image compression technique where an image is viewed as a matrix and then the operations are performed on the matrix. All the contemporary digital image compression systems use various mathematical transforms for compression. The compression performance is closely related to the performance by these mathematical transforms in terms of energy compaction and spatial frequency isolation by exploiting inter-pixel redundancies present in the image data. Through this paper, a comprehensive literature survey has been carried out and the pros and cons of various transform-based image compression models have also been discussed

    Determination of the visibility thresholds for subband image coding

    Get PDF
    IEEE International Symposium on Circuits and Systems, Hong Kong, China, 9-12 June 1997In this paper, a method to determine the visibility thresholds for a given subband system is introduced. Our approach identifies the important worse case combination of the quantization error in each basis function and determine the corresponding visibility thresholds using results previously measured in the DCT domain. Using the proposed method, it is relatively simple to determine the visibility thresholds for any subband system in the YCrCb domain. Simulation results on the Lapped Orthogonal Transform (LOT) show that the visibility thresholds are useful in visually lossless coding and perceptual weighted quantization.published_or_final_versio

    A Wavelet Visible Difference Predictor

    Get PDF
    In this paper, we describe a model of the human visual system (HVS) based on the wavelet transform. This model is largely based on a previously proposed model, but has a number of modifications that make it more amenable to potential integration into a wavelet based image compression scheme. These modifications include the use of a separable wavelet transform instead of the cortex transform, the application of a wavelet contrast sensitivity function (CSF), and a simplified definition of subband contrast that allows us to predict noise visibility directly from wavelet coefficients. Initially, we outline the luminance, frequency, and masking sensitivities of the HVS and discuss how these can be incorporated into the wavelet transform. We then outline a number of limitations of the wavelet transform as a model of the HVS, namely the lack of translational invariance and poor orientation sensitivity. In order to investigate the efficacy of this wavelet based model, a wavelet visible difference predictor (WVDP) is described. The WVDP is then used to predict visible differences between an original and compressed (or noisy) image. Results are presented to emphasize the limitations of commonly used measures of image quality and to demonstrate the performance of the WVDP. The paper concludes with suggestions on how the WVDP can be used to determine a visually optimal quantization strategy for wavelet coefficients and produce a quantitative measure of image quality

    State of the art in 2D content representation and compression

    Get PDF
    Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Evaluating Effect of Block Size in Compressed Sensing for Grayscale Images

    Get PDF
    Compressed sensing is an evolving methodology that enables sampling at sub-Nyquist rates and still provides decent signal reconstruction. During the last decade, the reported works have suggested to improve time efficiency by adopting Block based Compressed Sensing (BCS) and reconstruction performance improvement through new algorithms. A trade-off is required between the time efficiency and reconstruction quality. In this paper we have evaluated the significance of block size in BCS to improve reconstruction performance for grayscale images. A parameter variant of BCS [15] based sampling followed by reconstruction through Smoothed Projected Landweber (SPL) technique [16] involving use of Weiner smoothing filter and iterative hard thresholding is applied in this paper. The BCS variant is used to evaluate the effect of block size on image reconstruction quality by carrying out extensive testing on 9200 images acquired from online resources provided by Caltech101 [6], University of Granada [7] and Florida State University [8]. The experimentation showed some consistent results which can improve reconstruction performance in all BCS frameworks including BCS-SPL [17] and its variants [19], [27]. Firstly, the effect of varying block size (4x4, 8x8, 16x16, 32x32 and 64x64) results in changing the Peak Signal to Noise Ratio (PSNR) of reconstructed images from at least 1 dB to a maximum of 16 dB. This challenges the common notion that bigger block sizes always result in better reconstruction performance. Secondly, the variation in reconstruction quality with changing block size is mostly dependent on the image visual contents. Thirdly, images having similar visual contents, irrespective of the size, e.g., those from the same category of Caltech101 [6] gave majority vote for the same Optimum Block Size (OBS). These focused notes may help improve BCS based image capturing at many of the existing applications. For example, experimental results suggest using block size of 8x8 or 16x16 to capture facial identity using BCS. Fourthly, the average processing time taken for BCS and reconstruction through SPL with Lapped transform of Discrete Cosine Transform as the sparifying basis remained 300 milli-seconds for block size of 4x4 to 5 seconds for block size of 64x64. Since the processing time variation remains less than 5 seconds, selecting the OBS may not affect the time constraint in many applications. Analysis reveals that no particular block size is able to provide optimum reconstruction for all images with varying nature of visual contents. Therefore, the selection of block size should be made specific to the particular type of application images depending upon their visual contents

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    VHDL modeling and synthesis of the JPEG-XR inverse transform

    Get PDF
    This work presents a pipelined VHDL implementation of the inverse lapped biorthogonal transform used in the decompression process of the soon to be released JPEG-XR still image standard format. This inverse transform involves integer only calculations using lifting operations and Kronecker products. Divisions and multiplications by small integer coefficients are implemented using a bit shift and add technique resulting in a multiplier-less implementation with 736 instances of addition. When targeted to an Altera Stratix II FPGA with a 50 MHz system clock, this design is capable of completing the inverse transform of an 8400 x 6600 pixel image in less than 70 ms
    corecore