45,628 research outputs found

    Parallel Lossless Image Compression Using Huffman and Arithmetic Coding

    Get PDF
    We show that high-resolution images can be encoded and decoded e ciently in parallel. We present an algorithm based on the hierarchical MLP method, used either with Hu man coding or with a new variant of arithmetic coding called quasi-arithmetic coding. The coding step can be parallelized, even though the codes for di erent pixels are of di erent lengths; parallelization of the prediction and error modeling components is straightforward

    Arithmetic Coding for Data Compression

    Get PDF
    (c) 1994 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Arithmetic coding provides an e ective mechanism for remov- ing redundancy in the encoding of data. We show how arithmetic coding works and describe an e cient implementation that uses table lookup as a fast alternative to arithmetic operations. The reduced-precision arithmetic has a provably negligible e ect on the amount of compression achieved. We can speed up the implemen- tation further by use of parallel processing. We discuss the role of probability models and how they provide probability information to the arithmetic coder. We conclude with perspectives on the comparative advantages and disadvantages of arithmetic coding

    High-speed EBCOT with dual context-modeling coding architecture for JPEG2000

    Get PDF
    [[abstract]]This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ coder) for the embedded block coding (EBCOT) unit of the JPEG2000 encoder. The tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system, and the proposed parallel architecture can increase the throughput rate of the context-modeling. To match the high throughput rate of the parallel context-modeling architecture, and efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 185MHz to encode one symbol each cycle. Compared with the conventional context-modeling architecture, our parallel architecture can decrease the execution time about 25%.[[conferencetype]]國際[[conferencedate]]20040523~20040526[[conferencelocation]]溫哥華, 加拿

    On-line Digit Set Conversion for Rational Digit Number

    Get PDF
    A number system that is well-designed can affect the computational time and the hardware implementation. An interesting number system called Round-to-Nearest coding (RN-coding) was proposed to reduce a time consuming in a rounding process. Rounding to the nearest in RN-coding can be done using only truncation at any positions in a sequence of digits (representation). This concept can save a lot of time in a parallel or pipeline computation manner. However, an RN-coding does not support an on-line arithmetic computation. In this paper, we propose a rational digit number system which is composed of rational signed-digits in the digit set. This new system preserves a round-to-nearest property and is suitable for an on-line arithmetic computation. Performing on-line elementary arithmetic operations in our system can be done by an on-line digit set conversion algorithm. We show that our new algorithm, which is an improvement of an on-line addition algorithm in our previous work, can be demonstrated by an on-line finite automaton with a finite on-line delay k.A number system that is well-designed can affect the computational time and the hardware implementation. An interesting number system called Round-to-Nearest coding (RN-coding) was proposed to reduce a time consuming in a rounding process. Rounding to the nearest in RN-coding can be done using only truncation at any positions in a sequence of digits (representation). This concept can save a lot of time in a parallel or pipeline computation manner. However, an RN-coding does not support an on-line arithmetic computation. In this paper, we propose a rational digit number system which is composed of rational signed-digits in the digit set. This new system preserves a round-to-nearest property and is suitable for an on-line arithmetic computation. Performing on-line elementary arithmetic operations in our system can be done by an on-line digit set conversion algorithm. We show that our new algorithm, which is an improvement of an on-line addition algorithm in our previous work, can be demonstrated by an on-line finite automaton with a finite on-line delay k

    Sample-Parallel Execution of EBCOT in Fast Mode

    Get PDF
    JPEG 2000’s most computationally expensive building block is the Embedded Block Coder with Optimized Truncation (EBCOT). This paper evaluates how encoders targeting a parallel architecture such as a GPU can increase their throughput in use cases where very high data rates are used. The compression efficiency in the less significant bit-planes is then often poor and it is beneficial to enable the Selective Arithmetic Coding Bypass style (fast mode) in order to trade a small loss in compression efficiency for a reduction of the computational complexity. More importantly, this style exposes a more finely grained parallelism that can be exploited to execute the raw coding passes, including bit-stuffing, in a sample-parallel fashion. For a latency- or memory critical application that encodes one frame at a time, EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an optimized GPU-based implementation. When a low GPU occupancy has already been addressed by encoding multiple frames in parallel, the throughput can still be improved by 5% for high-entropy images and 27% for low-entropy images. Best results are obtained when enabling the fast mode after the fourth significant bit-plane. For most of the test images the compression rate is within 1% of the original

    Bitplane image coding with parallel coefficient processing

    Get PDF
    Image coding systems have been traditionally tailored for multiple instruction, multiple data (MIMD) computing. In general, they partition the (transformed) image in codeblocks that can be coded in the cores of MIMD-based processors. Each core executes a sequential flow of instructions to process the coefficients in the codeblock, independently and asynchronously from the others cores. Bitplane coding is a common strategy to code such data. Most of its mechanisms require sequential processing of the coefficients. The last years have seen the upraising of processing accelerators with enhanced computational performance and power efficiency whose architecture is mainly based on the single instruction, multiple data (SIMD) principle. SIMD computing refers to the execution of the same instruction to multiple data in a lockstep synchronous way. Unfortunately, current bitplane coding strategies cannot fully profit from such processors due to inherently sequential coding task. This paper presents bitplane image coding with parallel coefficient (BPC-PaCo) processing, a coding method that can process many coefficients within a codeblock in parallel and synchronously. To this end, the scanning order, the context formation, the probability model, and the arithmetic coder of the coding engine have been re-formulated. The experimental results suggest that the penalization in coding performance of BPC-PaCo with respect to the traditional strategies is almost negligible

    Design and Implementation of a High-Throughput CABAC Hardware Accelerator for the HEVC Decoder

    Get PDF
    HEVC is the new video coding standard of the Joint Collaborative Team on Video Coding. As in its predecessor H.264/AVC, Context-based Adaptive Binary Arithmetic Coding (CABAC) is a throughput bottleneck. This paper presents a hardware acceleration approach for transform coefficient decoding, the most time consuming part of CABAC in HEVC. In addition to a baseline design, a pipelined architecture and a parallel algorithm are implemented in an FPGA to evaluate the gain of these optimizations. The resulting baseline hardware design decodes 62 Mbins/s and achieves a 10× speed-up compared to an optimized software decoder for a typical workload at only a tenth of the processors clock frequency. The pipelined design gives an additional 13.5%, while the parallel design provides a 10% throughput improvement compared to the baseline. According to these results, HEVC CABAC decoding offers good hardware acceleration opportunities that should be further exploited in future work

    Bridging Hamming Distance Spectrum with Coset Cardinality Spectrum for Overlapped Arithmetic Codes

    Full text link
    Overlapped arithmetic codes, featured by overlapped intervals, are a variant of arithmetic codes that can be used to implement Slepian-Wolf coding. To analyze overlapped arithmetic codes, we have proposed two theoretical tools: Coset Cardinality Spectrum (CCS) and Hamming Distance Spectrum (HDS). The former describes how source space is partitioned into cosets (equally or unequally), and the latter describes how codewords are structured within each coset (densely or sparsely). However, until now, these two tools are almost parallel to each other, and it seems that there is no intersection between them. The main contribution of this paper is bridging HDS with CCS through a rigorous mathematical proof. Specifically, HDS can be quickly and accurately calculated with CCS in some cases. All theoretical analyses are perfectly verified by simulation results

    Joint Algorithm-Architecture Optimization of CABAC

    Get PDF
    This paper uses joint algorithm and architecture design to enable high coding efficiency in conjunction with high processing speed and low area cost. Specifically, it presents several optimizations that can be performed on Context Adaptive Binary Arithmetic Coding (CABAC), a form of entropy coding used in H.264/AVC, to achieve the throughput necessary for real-time low power high definition video coding. The combination of syntax element partitions and interleaved entropy slices, referred to as Massively Parallel CABAC, increases the number of binary symbols that can be processed in a cycle. Subinterval reordering is used to reduce the cycle time required to process each binary symbol. Under common conditions using the JM12.0 software, the Massively Parallel CABAC, increases the bins per cycle by 2.7 to 32.8× at a cost of 0.25 to 6.84% coding loss compared with sequential single slice H.264/AVC CABAC. It also provides a 2× reduction in area cost, and reduces memory bandwidth. Subinterval reordering reduces the critical path delay by 14 to 22%, while modifications to context selection reduces the memory requirement by 67%. This work demonstrates that accounting for implementation cost during video coding algorithms design can enable higher processing speed and reduce hardware cost, while still delivering high coding efficiency in the next generation video coding standard.Texas Instruments Incorporated (Graduate Women's Fellowship for Leadership in Microelectronics)Natural Sciences and Engineering Research Council of Canad
    corecore