88 research outputs found

    A VLSI architecture of JPEG2000 encoder

    Get PDF
    Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142

    Sample-Parallel Execution of EBCOT in Fast Mode

    Get PDF
    JPEG 2000’s most computationally expensive building block is the Embedded Block Coder with Optimized Truncation (EBCOT). This paper evaluates how encoders targeting a parallel architecture such as a GPU can increase their throughput in use cases where very high data rates are used. The compression efficiency in the less significant bit-planes is then often poor and it is beneficial to enable the Selective Arithmetic Coding Bypass style (fast mode) in order to trade a small loss in compression efficiency for a reduction of the computational complexity. More importantly, this style exposes a more finely grained parallelism that can be exploited to execute the raw coding passes, including bit-stuffing, in a sample-parallel fashion. For a latency- or memory critical application that encodes one frame at a time, EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an optimized GPU-based implementation. When a low GPU occupancy has already been addressed by encoding multiple frames in parallel, the throughput can still be improved by 5% for high-entropy images and 27% for low-entropy images. Best results are obtained when enabling the fast mode after the fourth significant bit-plane. For most of the test images the compression rate is within 1% of the original

    Low cost architecture for JPEG2000 encoder without code-block memory

    Get PDF
    [[abstract]]The amount of memory required for code-block is one of the most important issues in JPEG2000 encoder chip implementation. This work tries to unify the output scanning order of the 2D-DWT and the processing order of the EBCOT and further to eliminate the code-block memory completely eliminated. We also propose a new architecture for embedded block coding (EBC), code-block switch adaptive embedded block coding (CS-AEBC), which can skip the insignificant bit-planes to reduce the computation time and save power consumption. Besides, a new dynamic rate distortion optimization (RDO) approach is proposed to reduce the computation time when the EBC processes lossy compression operation. The total memory required for the proposed JPEG2000 is only 2KB of internal memory, and the bandwidth required for the external memory is 2.1 B/cycle.[[conferencetype]]國際[[conferencedate]]20080623-20080626[[iscallforpapers]]Y[[conferencelocation]]Hannover, German

    Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization

    Get PDF
    With the bottom-line goal of increasing the throughput of a GPU-accelerated JPEG 2000 encoder, this paper evaluates whether the post-compression rate control and packetization routines should be carried out on the CPU or on the GPU. Three co-processing models that differ in how the workload is split among the CPU and GPU are introduced. Both routines are discussed and algorithms for executing them in parallel are presented. Experimental results for compressing a detail-rich UHD sequence to 4 bits/sample indicate speed-ups of 200x for the rate control and 100x for the packetization compared to the single-threaded implementation in the commercial Kakadu library. These two routines executed on the CPU take 4x as long as all remaining coding steps on the GPU and therefore present a bottleneck. Even if the CPU bottleneck could be avoided with multi-threading, it is still beneficial to execute all coding steps on the GPU as this minimizes the required device-to-host transfer and thereby speeds up the critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to 22.4 fps for 0.16 bits/sample

    High Efficiency Concurrent Embedded Block Coding Architecture for JPEG 2000

    Get PDF
    [[abstract]]Embedded block coding with optimized truncation (EBCOT) is the most important part of JPEG 2000. Due to the bit level operation and the three-pass scanning technique, the EBCOT may take more than 50% operation time in the JPEG 2000. This paper presents a high efficiency concurrent EBCOT(HECEBC) entropy encoder hardware architecture. The proposed HECEBC can concurrently process the four samples in a stripe column. Furthermore this architecture can be extended to process several stripe columns concurrently for the JPEG 2000 to accomplish high resolution applications in real time. Besides, the HECEBC uses the technique of concentrated context window to stabilize the Context-Decision (CX-D) output to relax the load in between the arithmetic encoder (AE) and the parallel-in-serial-out (PISO) buffer to triple the EBC performance.[[notice]]補正完畢[[incitationindex]]EI[[booktype]]紙

    High efficiency architecture of ESCOT with pass concurrent context modeling scheme for scalable video coding

    Get PDF
    [[abstract]]In this work, we propose a high efficiency hardware architecture of embedded sub-band coding with optimal truncation (ESCOT) with pass concurrent context modeling (PCCM) scheme for wavelet-based scalable video coding (SVC). PCCM can merge the three-pass process of bit-plane coding into a single pass process. It improves the efficiency of the ESCOT algorithm and reduces the frequencies of memory access, which can reduce the power consumption. Furthermore we use the parallel architecture scheme of PCCM to encode 4 samples concurrently, which improves the operation speed and can reduce 40% of internal memory requirement. We use Artison TSMC 0.18 mum 1P6M standard cell library to design and implement the proposed concurrent context modeling. The simulation results indicate that PCCM can have an operation speedup of 9.5 compared to the standard context modeling of ESCOT, and it can operate for 1080 p with frame rate of 30 fps at clock rate of 125 MHz.[[conferencetype]]國際[[conferencedate]]20080518~20080521[[iscallforpapers]]Y[[conferencelocation]]Seattle, WA, US

    Parallel architectural design space exploration for real-time image compression

    Get PDF
    Embedded block coding with optimized truncation (EBCOT) is a coding algorithm used in JPEG2000. EBCOT operates on the wavelet transformed data to generate highly scalable compressed bit stream. Sub-band samples obtained from wavelet transform are partitioned into smaller blocks called code-blocks. EBCOT encoding is done on blocks to avoid error propagation through the bands and to increase robustness. Block wise encoding provides flexibility for parallel hardware implementation of EBCOT. The encoding process in JPEG2000 is divided into two phases: Tier 1 coding (Entropy encoding) and Tier 2 coding (Tag tree coding). This thesis deals with design space exploration and implementation of parallel hardware architecture of Tier 1 encoder used in JPEG2000. Parallel capabilities of Tier-1 encoder is the motivation for exploration of high performance real time image compression architecture in hardware. The design space covers the following investigations: - The effect of block-size in terms of resources, speed, and compression performance, - Computational performance. The key computational performance parameters targeted by the architecture are - significant speedup compared to a sequential implementation, - minimum processing latency and, - minimum logic resource utilization. The proposed architecture is developed for an embedded application system, coded in VHDL and synthesized for implementation on Xilinx FPGA system

    Accelerating BPC-PaCo through visually lossless techniques

    Get PDF
    Fast image codecs are a current need in applications that deal with large amounts of images. Graphics Processing Units (GPUs) are suitable processors to speed up most kinds of algorithms, especially when they allow fine-grain parallelism. Bitplane Coding with Parallel Coefficient processing (BPC-PaCo) is a recently proposed algorithm for the core stage of wavelet-based image codecs tailored for the highly parallel architectures of GPUs. This algorithm provides complexity scalability to allow faster execution at the expense of coding efficiency. Its main drawback is that the speedup and loss in image quality is controlled only roughly, resulting in visible distortion at low and medium rates. This paper addresses this issue by integrating techniques of visually lossless coding into BPC-PaCo. The resulting method minimizes the visual distortion introduced in the compressed file, obtaining higher-quality images to a human observer. Experimental results also indicate 12% speedups with respect to BPC-PaCo

    Toward Free and Open Source Film Projection for Digital Cinema

    Get PDF
    International audienceCinema industry has chosen Digital Cinema Package (DCP) as encoding format for the distribution of digital films. DCP uses JPEG2000 for video compression. An efficient implementation of coding and decoding for this format is complex, however. Currently deployed equipment is expensive and has high maintenance costs, preventing art-house cinema theaters from acquiring it. Therefore, we conduct this research activity in cooperation with Utopia cinemas, a group of art-house cinemas, whose main requirement (besides functional ones) is to provide Free and Open Source Software (FOSS). This paper presents a solution that achieves real-time JPEG2000 decoding and DCP presentation based on widespread open source multimedia tools, namely VLC and libavcodec library. We present the improvements that were made in VLC to support the DCP packaging format, as well as details on JPEG2000 decoding inside libavcodec (optimization and lossy decoding). We also evaluate the performance of the decoding chai

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot
    corecore