96 research outputs found

    A VLSI architecture of JPEG2000 encoder

    Get PDF
    Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    Accelerating BPC-PaCo through visually lossless techniques

    Get PDF
    Fast image codecs are a current need in applications that deal with large amounts of images. Graphics Processing Units (GPUs) are suitable processors to speed up most kinds of algorithms, especially when they allow fine-grain parallelism. Bitplane Coding with Parallel Coefficient processing (BPC-PaCo) is a recently proposed algorithm for the core stage of wavelet-based image codecs tailored for the highly parallel architectures of GPUs. This algorithm provides complexity scalability to allow faster execution at the expense of coding efficiency. Its main drawback is that the speedup and loss in image quality is controlled only roughly, resulting in visible distortion at low and medium rates. This paper addresses this issue by integrating techniques of visually lossless coding into BPC-PaCo. The resulting method minimizes the visual distortion introduced in the compressed file, obtaining higher-quality images to a human observer. Experimental results also indicate 12% speedups with respect to BPC-PaCo

    High-speed EBCOT with dual context-modeling coding architecture for JPEG2000

    Get PDF
    [[abstract]]This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ coder) for the embedded block coding (EBCOT) unit of the JPEG2000 encoder. The tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system, and the proposed parallel architecture can increase the throughput rate of the context-modeling. To match the high throughput rate of the parallel context-modeling architecture, and efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 185MHz to encode one symbol each cycle. Compared with the conventional context-modeling architecture, our parallel architecture can decrease the execution time about 25%.[[conferencetype]]國際[[conferencedate]]20040523~20040526[[conferencelocation]]溫哥華, 加拿

    Complexity scalable bitplane image coding with parallel coefficient processing

    Get PDF
    Very fast image and video codecs are a pursued goal both in the academia and the industry. This paper presents a complexity scalable and parallel bitplane coding engine for wavelet-based image codecs. The proposed method processes the coefficients in parallel, suiting hardware architectures based on vector instructions. Our previous work is extended with a mechanism that provides complexity scalability to the system. Such a feature allows the coder to regulate the throughput achieved at the expense of slightly penalizing compression effi- ciency. Experimental results suggests that, when using the fastest speed, the method almost doubles the throughput of our previous engine while penalizing compression efficiency by about 10

    [[alternative]]The Research and Implementation of the Wireless Optical Communications Transceiver(II)

    Get PDF
    計畫編號:NSC94-2745-E032-002-URD研究期間:200508~200607研究經費:739,000[[sponsorship]]行政院國家科學委員

    Multiplierless, Folded 9/7 - 5/3 Wavelet VLSI Architecture

    Get PDF

    Result-Biased Distributed-Arithmetic-Based Filter Architectures for Approximately Computing the DWT

    Get PDF
    The discrete wavelet transform is a fundamental block in several schemes for image compression. Its implementation relies on filters that usually require multiplications leading to a relevant hardware complexity. Distributed arithmetic is a general and effective technique to implement multiplierless filters and has been exploited in the past to implement the discrete wavelet transform as well. This work proposes a general method to implement a discrete wavelet transform architecture based on distributed arithmetic to produce approximate results. The novelty of the proposed method relies on the use of result-biasing techniques (inspired by the ones used in fixed-width multiplier architectures), which cause a very small loss of quality of the compressed image (average loss of 0.11 dB and 0.20 dB in terms of PSNR for the 9/7 and 10/18 wavelet filters, respectively). Compared with previously proposed distributed-arithmetic-based architectures for the computation of the discrete wavelet transform, this technique saves from about 20% to 25% of hardware complexity
    corecore