71 research outputs found

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    A Survey of Signal Processing Problems and Tools in Holographic Three-Dimensional Television

    Get PDF
    Cataloged from PDF version of article.Diffraction and holography are fertile areas for application of signal theory and processing. Recent work on 3DTV displays has posed particularly challenging signal processing problems. Various procedures to compute Rayleigh-Sommerfeld, Fresnel and Fraunhofer diffraction exist in the literature. Diffraction between parallel planes and tilted planes can be efficiently computed. Discretization and quantization of diffraction fields yield interesting theoretical and practical results, and allow efficient schemes compared to commonly used Nyquist sampling. The literature on computer-generated holography provides a good resource for holographic 3DTV related issues. Fast algorithms to compute Fourier, Walsh-Hadamard, fractional Fourier, linear canonical, Fresnel, and wavelet transforms, as well as optimization-based techniques such as best orthogonal basis, matching pursuit, basis pursuit etc., are especially relevant signal processing techniques for wave propagation, diffraction, holography, and related problems. Atomic decompositions, multiresolution techniques, Gabor functions, and Wigner distributions are among the signal processing techniques which have or may be applied to problems in optics. Research aimed at solving such problems at the intersection of wave optics and signal processing promises not only to facilitate the development of 3DTV systems, but also to contribute to fundamental advances in optics and signal processing theory. © 2007 IEEE

    Compressive sensing based image processing and energy-efficient hardware implementation with application to MRI and JPG 2000

    Get PDF
    In the present age of technology, the buzzwords are low-power, energy-efficient and compact systems. This directly leads to the date processing and hardware techniques employed in the core of these devices. One of the most power-hungry and space-consuming schemes is that of image/video processing, due to its high quality requirements. In current design methodologies, a point has nearly been reached in which physical and physiological effects limit the ability to just encode data faster. These limits have led to research into methods to reduce the amount of acquired data without degrading image quality and increasing the energy consumption. Compressive sensing (CS) has emerged as an efficient signal compression and recovery technique, which can be used to efficiently reduce the data acquisition and processing. It exploits the sparsity of a signal in a transform domain to perform sampling and stable recovery. This is an alternative paradigm to conventional data processing and is robust in nature. Unlike the conventional methods, CS provides an information capturing paradigm with both sampling and compression. It permits signals to be sampled below the Nyquist rate, and still allowing optimal reconstruction of the signal. The required measurements are far less than those of conventional methods, and the process is non-adaptive, making the sampling process faster and universal. In this thesis, CS methods are applied to magnetic resonance imaging (MRI) and JPEG 2000, which are popularly used imaging techniques in clinical applications and image compression, respectively. Over the years, MRI has improved dramatically in both imaging quality and speed. This has further revolutionized the field of diagnostic medicine. However, imaging speed, which is essential to many MRI applications still remains a major challenge. The specific challenge addressed in this work is the use of non-Fourier based complex measurement-based data acquisition. This method provides the possibility of reconstructing high quality MRI data with minimal measurements, due to the high incoherence between the two chosen matrices. Similarly, JPEG2000, though providing a high compression, can be further improved upon by using compressive sampling. In addition, the image quality is also improved. Moreover, having a optimized JPEG 2000 architecture reduces the overall processing, and a faster computation when combined with CS. Considering the requirements, this thesis is presented in two parts. In the first part: (1) A complex Hadamard matrix (CHM) based 2D and 3D MRI data acquisition with recovery using a greedy algorithm is proposed. The CHM measurement matrix is shown to satisfy the necessary condition for CS, known as restricted isometry property (RIP). The sparse recovery is done using compressive sampling matching pursuit (CoSaMP); (2) An optimized matrix and modified CoSaMP is presented, which enhances the MRI performance when compared with the conventional sampling; (3) An energy-efficient, cost-efficient hardware design based on field programmable gate array (FPGA) is proposed, to provide a platform for low-cost MRI processing hardware. At every stage, the design is proven to be superior with other commonly used MRI-CS methods and is comparable with the conventional MRI sampling. In the second part, CS techniques are applied to image processing and is combined with JPEG 2000 coder. While CS can reduce the encoding time, the effect on the overall JPEG 2000 encoder is not very significant due to some complex JPEG 2000 algorithms. One problem encountered is the big-level operations in JPEG 2000 arithmetic encoding (AE), which is completely based on bit-level operations. In this work, this problem is tackled by proposing a two-symbol AE with an efficient FPGA based hardware design. Furthermore, this design is energy-efficient, fast and has lower complexity when compared to conventional JPEG 2000 encoding

    Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms

    Get PDF
    Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression algorithm for multimedia contents, namely images, achieving high compression ratios, maintaining good image quality, Rodrigues et al. [2008]. However, in comparison with other existing algorithms, this algorithm takes some time to execute. Therefore, two parallel implementations for GPUs were proposed by Ribeiro [2016] and Silva [2015] in CUDA and OpenCL-GPU, respectively. In this dissertation, to complement the referred work, we propose two parallel versions that run the MMP algorithm in CPU: one resorting to OpenMP and another that converts the existing OpenCL-GPU into OpenCL-CPU. The proposed solutions are able to improve the computational performance of MMP by 3 and 2:7 , respectively. The High Efficiency Video Coding (HEVC/H.265) is the most recent standard for compression of image and video. Its impressive compression performance, makes it a target for many adaptations, particularly for holoscopic image/video processing (or light field). Some of the proposed modifications to encode this new multimedia content are based on geometry-based disparity compensations (SS), developed by Conti et al. [2014], and a Geometric Transformations (GT) module, proposed by Monteiro et al. [2015]. These compression algorithms for holoscopic images based on HEVC present an implementation of specific search for similar micro-images that is more efficient than the one performed by HEVC, but its implementation is considerably slower than HEVC. In order to enable better execution times, we choose to use the OpenCL API as the GPU enabling language in order to increase the module performance. With its most costly setting, we are able to reduce the GT module execution time from 6.9 days to less then 4 hours, effectively attaining a speedup of 45

    Novel Fast Algorithms For Low Rank Matrix Approximation

    Full text link
    Recent advances in matrix approximation have seen an emphasis on randomization techniques in which the goal was to create a sketch of an input matrix. This sketch, a random submatrix of an input matrix, having much fewer rows or columns, still preserves its relevant features. In one of such techniques random projections approximate the range of an input matrix. Dimension reduction transforms are obtained by means of multiplication of an input matrix by one or more matrices which can be orthogonal, random, and allowing fast multiplication by a vector. The Subsampled Randomized Hadamard Transform (SRHT) is the most popular among transforms. An m x n matrix can be multiplied by an n x l SRHT matrix in O(mn log l) arithmetic operations where typically l \u3c\u3c min(m, n). This dissertation introduces an alternative, which we call the Subsampled Randomized Approximate Hadamard Transform (SRAHT), and for which complexity of multiplication by an input matrix decreases to O( (2n + l log n) m ) operations. We also prove that our sublinear cost variants of a popular subspace sampling algorithm output accurate low rank approximation (hereafter LRA) of a large class of input. Finally, we introduce new sublinear algorithms for the CUR LRA matrix factorization which consists of a column subset C and a row subset R of an input matrix and a connector matrix U. We prove that these CUR algorithms provide close LRA with a high probability on a random input matrix admitting LRA

    Structured Compressed Sensing Using Deterministic Sequences

    No full text
    The problem of estimating sparse signals based on incomplete set of noiseless or noisy measurements has been investigated for a long time from different perspec- tives. In this dissertation, after the review of the theory of compressed sensing (CS) and existing structured sensing matrices, a new class of convolutional sensing matri- ces based on deterministic sequences are developed in the first part. The proposed matrices can achieve a near optimal bound with O(K log(N)) measurements for non-uniform recovery. Not only are they able to approximate compressible signals in the time domain, but they can also recover sparse signals in the frequency and discrete cosine transform domain. The candidates of the deterministic sequences include maximum length sequence (or called m-sequence), Golay's complementary sequence and Legendre sequence etc., which will be investigated respectively. In the second part, Golay-paired Hadamard matrices are introduced as structured sensing matrices, which are constructed from the Hadamard matrix, followed by diagonal Golay sequences. The properties and performances are analyzed in the following. Their strong structures ensure special isometry properties, and make them be easier applicable to hardware potentially. Finally, we exploit novel CS principles successfully in a few real applications, including radar imaging and dis- tributed source coding. The performance and the effectiveness of each scenario are verified in both theory and simulations

    Exposing a waveform interface to the wireless channel for scalable video broadcast

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 157-167).Video broadcast and mobile video challenge the conventional wireless design. In broadcast and mobile scenarios the bit-rate supported by the channel differs across receivers and varies quickly over time. The conventional design however forces the source to pick a single bit-rate and degrades sharply when the channel cannot support it. This thesis presents SoftCast, a clean-slate design for wireless video where the source transmits one video stream that each receiver decodes to a video quality commensurate with its specific instantaneous channel quality. To do so, SoftCast ensures the samples of the digital video signal transmitted on the channel are linearly related to the pixels' luminance. Thus, when channel noise perturbs the transmitted signal samples, the perturbation naturally translates into approximation in the original video pixels. Hence, a receiver with a good channel (low noise) obtains a high fidelity video, and a receiver with a bad channel (high noise) obtains a low fidelity video. SoftCast's linear design in essence resembles the traditional analog approach to communication, which was abandoned in most major communication systems, as it does not enjoy the theoretical opimality of the digital separate design in point-topoint channels nor its effectiveness at compressing the source data. In this thesis, I show that in combination with decorrelating transforms common to modern digital video compression, the analog approach can achieve performance competitive with the prevalent digital design for a wide variety of practical point-to-point scenarios, and outperforms it in the broadcast and mobile scenarios. Since the conventional bit-pipe interface of the wireless physical layer (PHY) forces the separation of source and channel coding, to realize SoftCast, architectural changes to the wireless PHY are necessary. This thesis discusses the design of RawPHY, a reorganization of the PHY which exposes a waveform interface to the channel while shielding the designers of the higher layers from much of the perplexity of the wireless channel. I implement SoftCast and RawPHY using the GNURadio software and the USRP platform. Results from a 20-node testbed show that SoftCast improves the average video quality (i.e., PSNR) across diverse broadcast receivers in our testbed by up to 5.5 dB in comparison to conventional single- or multi-layer video. Even for a single receiver, it eliminates video glitches caused by mobility and increases robustness to packet loss by an order of magnitude.by Szymon Kazimierz Jakubczak.Ph.D

    DATA COMPRESSION OVER SEISMIC SENSOR NETWORKS

    Get PDF
    corecore