400 research outputs found

    Bitplane image coding with parallel coefficient processing

    Get PDF
    Image coding systems have been traditionally tailored for multiple instruction, multiple data (MIMD) computing. In general, they partition the (transformed) image in codeblocks that can be coded in the cores of MIMD-based processors. Each core executes a sequential flow of instructions to process the coefficients in the codeblock, independently and asynchronously from the others cores. Bitplane coding is a common strategy to code such data. Most of its mechanisms require sequential processing of the coefficients. The last years have seen the upraising of processing accelerators with enhanced computational performance and power efficiency whose architecture is mainly based on the single instruction, multiple data (SIMD) principle. SIMD computing refers to the execution of the same instruction to multiple data in a lockstep synchronous way. Unfortunately, current bitplane coding strategies cannot fully profit from such processors due to inherently sequential coding task. This paper presents bitplane image coding with parallel coefficient (BPC-PaCo) processing, a coding method that can process many coefficients within a codeblock in parallel and synchronously. To this end, the scanning order, the context formation, the probability model, and the arithmetic coder of the coding engine have been re-formulated. The experimental results suggest that the penalization in coding performance of BPC-PaCo with respect to the traditional strategies is almost negligible

    Complexity scalable bitplane image coding with parallel coefficient processing

    Get PDF
    Very fast image and video codecs are a pursued goal both in the academia and the industry. This paper presents a complexity scalable and parallel bitplane coding engine for wavelet-based image codecs. The proposed method processes the coefficients in parallel, suiting hardware architectures based on vector instructions. Our previous work is extended with a mechanism that provides complexity scalability to the system. Such a feature allows the coder to regulate the throughput achieved at the expense of slightly penalizing compression effi- ciency. Experimental results suggests that, when using the fastest speed, the method almost doubles the throughput of our previous engine while penalizing compression efficiency by about 10

    Bitplane Image Coding With Parallel Coefficient Processing

    Full text link

    Strategy of microscopic parallelism for Bitplane Image Coding

    Get PDF
    Recent years have seen the upraising of a new type of processors strongly relying on the Single Instruction, Multiple Data (SIMD) architectural principle. The main idea behind SIMD computing is to apply a flow of instructions to multiple pieces of data in parallel and synchronously. This permits the execution of thousands of operations in parallel, achieving higher computational performance than with traditional Multiple Instruction, Multiple Data (MIMD) architectures. The level of parallelism required in SIMD computing can only be achieved in image coding systems via microscopic parallel strategies that code multiple coefficients in parallel. Until now, the only way to achieve microscopic parallelism in bitplane coding engines was by executing multiple coding passes in parallel. Such a strategy does not suit well SIMD computing because each thread executes different instructions. This paper introduces the first bitplane coding engine devised for the fine grain of parallelism required in SIMD computing. Its main insight is to allow parallel coefficient processing in a coding pass. Experimental tests show coding performance results similar to those of JPEG2000

    Accelerating BPC-PaCo through visually lossless techniques

    Get PDF
    Fast image codecs are a current need in applications that deal with large amounts of images. Graphics Processing Units (GPUs) are suitable processors to speed up most kinds of algorithms, especially when they allow fine-grain parallelism. Bitplane Coding with Parallel Coefficient processing (BPC-PaCo) is a recently proposed algorithm for the core stage of wavelet-based image codecs tailored for the highly parallel architectures of GPUs. This algorithm provides complexity scalability to allow faster execution at the expense of coding efficiency. Its main drawback is that the speedup and loss in image quality is controlled only roughly, resulting in visible distortion at low and medium rates. This paper addresses this issue by integrating techniques of visually lossless coding into BPC-PaCo. The resulting method minimizes the visual distortion introduced in the compressed file, obtaining higher-quality images to a human observer. Experimental results also indicate 12% speedups with respect to BPC-PaCo

    Stationary probability model for microscopic parallelism in JPEG2000

    Get PDF
    Parallel processing is key to augmenting the throughput of image codecs. Despite numerous efforts to parallelize wavelet-based image coding systems, most attempts fail at the parallelization of the bitplane coding engine, which is the most computationally intensive stage of the coding pipeline. The main reason for this failure is the causality with which current coding strategies are devised, which assumes that one coefficient is coded after another. This work analyzes the mechanisms employed in bitplane coding and proposes alternatives to enhance opportunities for parallelism. We describe a stationary probability model that, without sacrificing the advantages of current approaches, removes the main obstacle to the parallelization of most coding strategies. Experimental tests evaluate the coding performance achieved by the proposed method in the framework of JPEG2000 when coding different types of images. Results indicate that the stationary probability model achieves similar coding performance, with slight increments or decrements depending on the image type and the desired level of parallelism

    GPU implementation of bitplane coding with parallel coefficient processing for high performance image compression

    Get PDF
    The fast compression of images is a requisite in many applications like TV production, teleconferencing, or digital cinema. Many of the algorithms employed in current image compression standards are inherently sequential. High performance implementations of such algorithms often require specialized hardware like field integrated gate arrays. Graphics Processing Units (GPUs) do not commonly achieve high performance on these algorithms because they do not exhibit fine-grain parallelism. Our previous work introduced a new core algorithm for wavelet-based image coding systems. It is tailored for massive parallel architectures. It is called bitplane coding with parallel coefficient processing (BPC-PaCo). This paper introduces the first high performance, GPU-based implementation of BPC-PaCo. A detailed analysis of the algorithm aids its implementation in the GPU. The main insights behind the proposed codec are an efficient thread-to-data mapping, a smart memory management, and the use of efficient cooperation mechanisms to enable inter-thread communication. Experimental results indicate that the proposed implementation matches the requirements for high resolution (4 K) digital cinema in real time, yielding speedups of 30x with respect to the fastest implementations of current compression standards. Also, a power consumption evaluation shows that our implementation consumes 40 x less energy for equivalent performance than state-of-the-art methods

    Distributed Video Coding: Iterative Improvements

    Get PDF

    GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000

    Get PDF
    Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard
    corecore