455 research outputs found

    Low cost architecture for JPEG2000 encoder without code-block memory

    Get PDF
    [[abstract]]The amount of memory required for code-block is one of the most important issues in JPEG2000 encoder chip implementation. This work tries to unify the output scanning order of the 2D-DWT and the processing order of the EBCOT and further to eliminate the code-block memory completely eliminated. We also propose a new architecture for embedded block coding (EBC), code-block switch adaptive embedded block coding (CS-AEBC), which can skip the insignificant bit-planes to reduce the computation time and save power consumption. Besides, a new dynamic rate distortion optimization (RDO) approach is proposed to reduce the computation time when the EBC processes lossy compression operation. The total memory required for the proposed JPEG2000 is only 2KB of internal memory, and the bandwidth required for the external memory is 2.1 B/cycle.[[conferencetype]]國際[[conferencedate]]20080623-20080626[[iscallforpapers]]Y[[conferencelocation]]Hannover, German

    Sample-Parallel Execution of EBCOT in Fast Mode

    Get PDF
    JPEG 2000’s most computationally expensive building block is the Embedded Block Coder with Optimized Truncation (EBCOT). This paper evaluates how encoders targeting a parallel architecture such as a GPU can increase their throughput in use cases where very high data rates are used. The compression efficiency in the less significant bit-planes is then often poor and it is beneficial to enable the Selective Arithmetic Coding Bypass style (fast mode) in order to trade a small loss in compression efficiency for a reduction of the computational complexity. More importantly, this style exposes a more finely grained parallelism that can be exploited to execute the raw coding passes, including bit-stuffing, in a sample-parallel fashion. For a latency- or memory critical application that encodes one frame at a time, EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an optimized GPU-based implementation. When a low GPU occupancy has already been addressed by encoding multiple frames in parallel, the throughput can still be improved by 5% for high-entropy images and 27% for low-entropy images. Best results are obtained when enabling the fast mode after the fourth significant bit-plane. For most of the test images the compression rate is within 1% of the original

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard

    Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

    Full text link
    We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several metrics. Second, we modify the recurrent architecture to improve spatial diffusion, which allows the network to more effectively capture and propagate image information through the network's hidden state. Finally, in addition to lossless entropy coding, we use a spatially adaptive bit allocation algorithm to more efficiently use the limited number of bits to encode visually complex image regions. We evaluate our method on the Kodak and Tecnick image sets and compare against standard codecs as well recently published methods based on deep neural networks

    A Real Time Image Processing Subsystem: GEZGIN

    Get PDF
    In this study, a real-time image processing subsystem, GEZGIN, which is currently being developed for BILSAT-1, a 100kg class micro-satellite, is presented. BILSAT-1 is being constructed in accordance with a technology transfer agreement between TÜBITAK-BILTEN (Turkey) and SSTL (UK) and planned to be placed into a 650 km sunsynchronous orbit in Summer 2003. GEZGIN is one of the two Turkish R&D payloads to be hosted on BILSAT-1. One of the missions of BILSAT-1 is constructing a Digital Elevation Model of Turkey using both multi-spectral and panchromatic imagers. Due to limited down-link bandwidth and on-board storage capacity, employment of a realtime image compression scheme is highly advantageous for the mission. GEZGIN has evolved as an implementation to achieve image compression tasks that would lead to an efficient utilization of both the down-link and on-board storage. The image processing on GEZGIN includes capturing of 4-band multi-spectral images of size 2048x2048 8- bit pixels, compressing them simultaneously with the new industry standard JPEG2000 algorithm and forwarding the compressed multi-spectral image to Solid State Data Recorders (SSDR) of BILSAT-1 for storage and down-link transmission. The mission definition together with orbital parameters impose a 6.5 seconds constraint on real-time image compression. GEZGIN meets this constraint by exploiting the parallelism among image processing units and assigning compute intensive tasks to dedicated hardware. The proposed hardware also allows for full reconfigurability of all processing units

    Analysis of runtime re-configuration systems

    Full text link
    In recent years Programmable Logic Devices (PLD) and in particular Field Programmable Gate Arrays (FPGAs) have seen a tremendous increase in sales and applications in the area of embedded systems. The main advantage of FPGAs is the flexibility that they offer a designer in reconfiguring the hardware. The flexibility achieved through re-configuration of FPGAs usually incurs an overhead of extra execution time, data memory and also power dissipation; FPGAs provide an ideal template for run-time reconfigurable (RTR) designs. Only recently have RTR enabling design tools that bypass the traditional synthesis and bitstream generation process for FPGAs become available, JBits is one of them. With run-time reconfiguration of FPGAs, we can perform partial reconfiguration, which allows reconfiguration of a part of an FPGA while the other part is executing some functional computation. The partial reconfiguration of a function can be performed earlier than the time when the function is really needed. Such configuration pre-fetch can hide the reconfiguration overhead more effectively; This thesis will implement a reconfigurable system and study the effect of runtime reconfiguration using VERILOG and a new Java based tool JBITS. This work will provide pointers to high level synthesis tools targeting runtime re-configuration
    corecore