590 research outputs found
A VLSI architecture of JPEG2000 encoder
Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142
Sample-Parallel Execution of EBCOT in Fast Mode
JPEG 2000’s most computationally expensive building
block is the Embedded Block Coder with Optimized Truncation
(EBCOT). This paper evaluates how encoders targeting a parallel
architecture such as a GPU can increase their throughput in use
cases where very high data rates are used. The compression
efficiency in the less significant bit-planes is then often poor and
it is beneficial to enable the Selective Arithmetic Coding Bypass
style (fast mode) in order to trade a small loss in compression
efficiency for a reduction of the computational complexity. More
importantly, this style exposes a more finely grained parallelism
that can be exploited to execute the raw coding passes, including
bit-stuffing, in a sample-parallel fashion. For a latency- or
memory critical application that encodes one frame at a time,
EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an
optimized GPU-based implementation. When a low GPU
occupancy has already been addressed by encoding multiple
frames in parallel, the throughput can still be improved by 5%
for high-entropy images and 27% for low-entropy images. Best
results are obtained when enabling the fast mode after the fourth
significant bit-plane. For most of the test images the compression
rate is within 1% of the original
Hybrid Neural Network Predictive-Wavelet Image Compression System
This paper considers a novel image compression technique called hybrid predictive wavelet coding. The
new proposed technique combines the properties of predictive coding and discrete wavelet coding. In
contrast to JPEG2000, the image data values are pre-processed using predictive coding to remove interpixel
redundancy. The error values, which are the difference between the original and the predicted
values, are discrete wavelet coding transformed. In this case, a nonlinear neural network predictor is
utilised in the predictive coding system. The simulation results indicated that the proposed technique
can achieve good compressed images at high decomposition levels in comparison to JPEG2000
Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization
With the bottom-line goal of increasing the
throughput of a GPU-accelerated JPEG 2000 encoder, this paper
evaluates whether the post-compression rate control and
packetization routines should be carried out on the CPU or on
the GPU. Three co-processing models that differ in how the
workload is split among the CPU and GPU are introduced. Both
routines are discussed and algorithms for executing them in
parallel are presented. Experimental results for compressing a
detail-rich UHD sequence to 4 bits/sample indicate speed-ups of
200x for the rate control and 100x for the packetization
compared to the single-threaded implementation in the
commercial Kakadu library. These two routines executed on the
CPU take 4x as long as all remaining coding steps on the GPU
and therefore present a bottleneck. Even if the CPU bottleneck
could be avoided with multi-threading, it is still beneficial to
execute all coding steps on the GPU as this minimizes the
required device-to-host transfer and thereby speeds up the
critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to
22.4 fps for 0.16 bits/sample
Multiprocessor DSP Implementation of the JPEG 2000 Codec
The transition to JPEG2000 from other image formats such as standard JPEG offers im proved compression and image quality, yet has not been widely adopted in practice. This is mainly due to the complexity of the JPEG2000 algorithm. Standard JPEG uses the Discrete Cosine Transform (DCT) and Huffmann encoding to achieve its compression, whereas JPEG2000 uses the wavelet transform and arithmetic encoding. Due to the wide acceptance of JPEG, there are processors such as Equator Technology\u27s BSP-15 digital signal processor (DSP) that have been designed with features specifically for JPEG appli cations. For some of the current digital printing applications where JPEG is used, images must be encoded and decoded at rates exceeding 100 pages per minute. A multiprocessor environment consisting of Equator Technology\u27s BSP-15 processors may offer acceptable performance for the JPEG2000 codec. The aim of this work is to design a JPEG2000 codec for the BSP-15 processor and to determine if this processor is capable of delivering the performance required by high end digital printers. The features of the BSP-15 that are well suited for the JPEG2000 algorithm will be discussed, as well as future improvements that could be incorporated into the architecture. By analyzing the advantages and disadvantages of this processor, the next generation of processors may be able to offer features that will allow it to excel in JPEG2000 processing. A multiprocessor DSP implementation of the JPEG2000 codec is the main result of this work. The resulting codec is able to provide more than double the processing throughput of existing JPEG2000 software
Data Compression in the Petascale Astronomy Era: a GERLUMPH case study
As the volume of data grows, astronomers are increasingly faced with choices
on what data to keep -- and what to throw away. Recent work evaluating the
JPEG2000 (ISO/IEC 15444) standards as a future data format standard in
astronomy has shown promising results on observational data. However, there is
still a need to evaluate its potential on other type of astronomical data, such
as from numerical simulations. GERLUMPH (the GPU-Enabled High Resolution
cosmological MicroLensing parameter survey) represents an example of a data
intensive project in theoretical astrophysics. In the next phase of processing,
the ~27 terabyte GERLUMPH dataset is set to grow by a factor of 100 -- well
beyond the current storage capabilities of the supercomputing facility on which
it resides. In order to minimise bandwidth usage, file transfer time, and
storage space, this work evaluates several data compression techniques.
Specifically, we investigate off-the-shelf and custom lossless compression
algorithms as well as the lossy JPEG2000 compression format. Results of
lossless compression algorithms on GERLUMPH data products show small
compression ratios (1.35:1 to 4.69:1 of input file size) varying with the
nature of the input data. Our results suggest that JPEG2000 could be suitable
for other numerical datasets stored as gridded data or volumetric data. When
approaching lossy data compression, one should keep in mind the intended
purposes of the data to be compressed, and evaluate the effect of the loss on
future analysis. In our case study, lossy compression and a high compression
ratio do not significantly compromise the intended use of the data for
constraining quasar source profiles from cosmological microlensing.Comment: 15 pages, 9 figures, 5 tables. Published in the Special Issue of
Astronomy & Computing on The future of astronomical data format
- …