584 research outputs found
ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression
Over the last few years, neural image compression has gained wide attention
from research and industry, yielding promising end-to-end deep neural codecs
outperforming their conventional counterparts in rate-distortion performance.
Despite significant advancement, current methods, including attention-based
transform coding, still need to be improved in reducing the coding rate while
preserving the reconstruction fidelity, especially in non-homogeneous textured
image areas. Those models also require more parameters and a higher decoding
time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient
ConvNeXt-based transform coding framework, paired with a compute-efficient
channel-wise auto-regressive prior to capturing both global and local contexts
from the hyper and quantized latent representations. The proposed architecture
can be optimized end-to-end to fully exploit the context information and
extract compact latent representation while reconstructing higher-quality
images. Experimental results on four widely-used datasets showed that
ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions
estimated on average to 5.24% and 1.22% over the versatile video coding (VVC)
reference encoder (VTM-18.0) and the state-of-the-art learned image compression
method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to
verify the computational efficiency of our approach and conduct several
objective and subjective analyses to bring to the fore the performance gap
between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.Comment: arXiv admin note: substantial text overlap with arXiv:2307.02273.
text overlap with arXiv:2307.0609
MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
Neural video codecs have recently become competitive with standard codecs
such as HEVC in the low-delay setting. However, most neural codecs are large
floating-point networks that use pixel-dense warping operations for temporal
modeling, making them too computationally expensive for deployment on mobile
devices. Recent work has demonstrated that running a neural decoder in real
time on mobile is feasible, but shows this only for 720p RGB video. This work
presents the first neural video codec that decodes 1080p YUV420 video in real
time on a mobile device. Our codec relies on two major contributions. First, we
design an efficient codec that uses a block-based motion compensation algorithm
available on the warping core of the mobile accelerator, and we show how to
quantize this model to integer precision. Second, we implement a fast decoder
pipeline that concurrently runs neural network components on the neural signal
processor, parallel entropy coding on the mobile GPU, and warping on the
warping core. Our codec outperforms the previous on-device codec by a large
margin with up to 48% BD-rate savings, while reducing the MAC count on the
receiver side by . We perform a careful ablation to demonstrate the
effect of the introduced motion compensation scheme, and ablate the effect of
model quantization.Comment: Matches version published at WACV 202
GPU-oriented architecture for an end-to-end image/video codec based on JPEG2000
Modern image and video compression standards employ computationally intensive algorithms that provide advanced features to the coding system. Current standards often need to be implemented in hardware or using expensive solutions to meet the real-time requirements of some environments. Contrarily to this trend, this paper proposes an end-to-end codec architecture running on inexpensive Graphics Processing Units (GPUs) that is based on, though not compatible with, the JPEG2000 international standard for image and video compression. When executed in a commodity Nvidia GPU, it achieves real time processing of 12K video. The proposed S/W architecture utilizes four CUDA kernels that minimize memory transfers, use registers instead of shared memory, and employ a double-buffer strategy to optimize the streaming of data. The analysis of throughput indicates that the proposed codec yields results at least 10× superior on average to those achieved with JPEG2000 implementations devised for CPUs, and approximately 4× superior to those achieved with hardwired solutions of the HEVC/H.265 video compression standard
Asymmetrically-powered Neural Image Compression with Shallow Decoders
Neural image compression methods have seen increasingly strong performance in
recent years. However, they suffer orders of magnitude higher computational
complexity compared to traditional codecs, which stands in the way of
real-world deployment. This paper takes a step forward in closing this gap in
decoding complexity by adopting shallow or even linear decoding transforms. To
compensate for the resulting drop in compression performance, we exploit the
often asymmetrical computation budget between encoding and decoding, by
adopting more powerful encoder networks and iterative encoding. We
theoretically formalize the intuition behind, and our experimental results
establish a new frontier in the trade-off between rate-distortion and decoding
complexity for neural image compression. Specifically, we achieve
rate-distortion performance competitive with the established mean-scale
hyperprior architecture of Minnen et al. (2018), while reducing the overall
decoding complexity by 80 %, or over 90 % for the synthesis transform alone.
Our code can be found at https://github.com/mandt-lab/shallow-ntc.Comment: Preprin
- …