97 research outputs found

    A VLSI architecture of JPEG2000 encoder

    Get PDF
    Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142

    Sample-Parallel Execution of EBCOT in Fast Mode

    Get PDF
    JPEG 2000’s most computationally expensive building block is the Embedded Block Coder with Optimized Truncation (EBCOT). This paper evaluates how encoders targeting a parallel architecture such as a GPU can increase their throughput in use cases where very high data rates are used. The compression efficiency in the less significant bit-planes is then often poor and it is beneficial to enable the Selective Arithmetic Coding Bypass style (fast mode) in order to trade a small loss in compression efficiency for a reduction of the computational complexity. More importantly, this style exposes a more finely grained parallelism that can be exploited to execute the raw coding passes, including bit-stuffing, in a sample-parallel fashion. For a latency- or memory critical application that encodes one frame at a time, EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an optimized GPU-based implementation. When a low GPU occupancy has already been addressed by encoding multiple frames in parallel, the throughput can still be improved by 5% for high-entropy images and 27% for low-entropy images. Best results are obtained when enabling the fast mode after the fourth significant bit-plane. For most of the test images the compression rate is within 1% of the original

    Parallel architectural design space exploration for real-time image compression

    Get PDF
    Embedded block coding with optimized truncation (EBCOT) is a coding algorithm used in JPEG2000. EBCOT operates on the wavelet transformed data to generate highly scalable compressed bit stream. Sub-band samples obtained from wavelet transform are partitioned into smaller blocks called code-blocks. EBCOT encoding is done on blocks to avoid error propagation through the bands and to increase robustness. Block wise encoding provides flexibility for parallel hardware implementation of EBCOT. The encoding process in JPEG2000 is divided into two phases: Tier 1 coding (Entropy encoding) and Tier 2 coding (Tag tree coding). This thesis deals with design space exploration and implementation of parallel hardware architecture of Tier 1 encoder used in JPEG2000. Parallel capabilities of Tier-1 encoder is the motivation for exploration of high performance real time image compression architecture in hardware. The design space covers the following investigations: - The effect of block-size in terms of resources, speed, and compression performance, - Computational performance. The key computational performance parameters targeted by the architecture are - significant speedup compared to a sequential implementation, - minimum processing latency and, - minimum logic resource utilization. The proposed architecture is developed for an embedded application system, coded in VHDL and synthesized for implementation on Xilinx FPGA system

    Low cost architecture for JPEG2000 encoder without code-block memory

    Get PDF
    [[abstract]]The amount of memory required for code-block is one of the most important issues in JPEG2000 encoder chip implementation. This work tries to unify the output scanning order of the 2D-DWT and the processing order of the EBCOT and further to eliminate the code-block memory completely eliminated. We also propose a new architecture for embedded block coding (EBC), code-block switch adaptive embedded block coding (CS-AEBC), which can skip the insignificant bit-planes to reduce the computation time and save power consumption. Besides, a new dynamic rate distortion optimization (RDO) approach is proposed to reduce the computation time when the EBC processes lossy compression operation. The total memory required for the proposed JPEG2000 is only 2KB of internal memory, and the bandwidth required for the external memory is 2.1 B/cycle.[[conferencetype]]國際[[conferencedate]]20080623-20080626[[iscallforpapers]]Y[[conferencelocation]]Hannover, German

    Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization

    Get PDF
    With the bottom-line goal of increasing the throughput of a GPU-accelerated JPEG 2000 encoder, this paper evaluates whether the post-compression rate control and packetization routines should be carried out on the CPU or on the GPU. Three co-processing models that differ in how the workload is split among the CPU and GPU are introduced. Both routines are discussed and algorithms for executing them in parallel are presented. Experimental results for compressing a detail-rich UHD sequence to 4 bits/sample indicate speed-ups of 200x for the rate control and 100x for the packetization compared to the single-threaded implementation in the commercial Kakadu library. These two routines executed on the CPU take 4x as long as all remaining coding steps on the GPU and therefore present a bottleneck. Even if the CPU bottleneck could be avoided with multi-threading, it is still beneficial to execute all coding steps on the GPU as this minimizes the required device-to-host transfer and thereby speeds up the critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to 22.4 fps for 0.16 bits/sample

    [[alternative]]The Research and Implementation of the Wireless Optical Communications Transceiver(II)

    Get PDF
    計畫編號:NSC94-2745-E032-002-URD研究期間:200508~200607研究經費:739,000[[sponsorship]]行政院國家科學委員

    High-speed EBCOT with dual context-modeling coding architecture for JPEG2000

    Get PDF
    [[abstract]]This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ coder) for the embedded block coding (EBCOT) unit of the JPEG2000 encoder. The tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system, and the proposed parallel architecture can increase the throughput rate of the context-modeling. To match the high throughput rate of the parallel context-modeling architecture, and efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 185MHz to encode one symbol each cycle. Compared with the conventional context-modeling architecture, our parallel architecture can decrease the execution time about 25%.[[conferencetype]]國際[[conferencedate]]20040523~20040526[[conferencelocation]]溫哥華, 加拿

    High Efficiency Concurrent Embedded Block Coding Architecture for JPEG 2000

    Get PDF
    [[abstract]]Embedded block coding with optimized truncation (EBCOT) is the most important part of JPEG 2000. Due to the bit level operation and the three-pass scanning technique, the EBCOT may take more than 50% operation time in the JPEG 2000. This paper presents a high efficiency concurrent EBCOT(HECEBC) entropy encoder hardware architecture. The proposed HECEBC can concurrently process the four samples in a stripe column. Furthermore this architecture can be extended to process several stripe columns concurrently for the JPEG 2000 to accomplish high resolution applications in real time. Besides, the HECEBC uses the technique of concentrated context window to stabilize the Context-Decision (CX-D) output to relax the load in between the arithmetic encoder (AE) and the parallel-in-serial-out (PISO) buffer to triple the EBC performance.[[notice]]補正完畢[[incitationindex]]EI[[booktype]]紙

    High efficiency architecture of ESCOT with pass concurrent context modeling scheme for scalable video coding

    Get PDF
    [[abstract]]In this work, we propose a high efficiency hardware architecture of embedded sub-band coding with optimal truncation (ESCOT) with pass concurrent context modeling (PCCM) scheme for wavelet-based scalable video coding (SVC). PCCM can merge the three-pass process of bit-plane coding into a single pass process. It improves the efficiency of the ESCOT algorithm and reduces the frequencies of memory access, which can reduce the power consumption. Furthermore we use the parallel architecture scheme of PCCM to encode 4 samples concurrently, which improves the operation speed and can reduce 40% of internal memory requirement. We use Artison TSMC 0.18 mum 1P6M standard cell library to design and implement the proposed concurrent context modeling. The simulation results indicate that PCCM can have an operation speedup of 9.5 compared to the standard context modeling of ESCOT, and it can operate for 1080 p with frame rate of 30 fps at clock rate of 125 MHz.[[conferencetype]]國際[[conferencedate]]20080518~20080521[[iscallforpapers]]Y[[conferencelocation]]Seattle, WA, US

    Accelerating BPC-PaCo through visually lossless techniques

    Get PDF
    Fast image codecs are a current need in applications that deal with large amounts of images. Graphics Processing Units (GPUs) are suitable processors to speed up most kinds of algorithms, especially when they allow fine-grain parallelism. Bitplane Coding with Parallel Coefficient processing (BPC-PaCo) is a recently proposed algorithm for the core stage of wavelet-based image codecs tailored for the highly parallel architectures of GPUs. This algorithm provides complexity scalability to allow faster execution at the expense of coding efficiency. Its main drawback is that the speedup and loss in image quality is controlled only roughly, resulting in visible distortion at low and medium rates. This paper addresses this issue by integrating techniques of visually lossless coding into BPC-PaCo. The resulting method minimizes the visual distortion introduced in the compressed file, obtaining higher-quality images to a human observer. Experimental results also indicate 12% speedups with respect to BPC-PaCo
    corecore