399 research outputs found
Practical Full Resolution Learned Lossless Image Compression
We propose the first practical learned lossless image compression system,
L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and
JPEG 2000. At the core of our method is a fully parallelizable hierarchical
probabilistic model for adaptive entropy coding which is optimized end-to-end
for the compression task. In contrast to recent autoregressive discrete
probabilistic models such as PixelCNN, our method i) models the image
distribution jointly with learned auxiliary representations instead of
exclusively modeling the image distribution in RGB space, and ii) only requires
three forward-passes to predict all pixel probabilities instead of one for each
pixel. As a result, L3C obtains over two orders of magnitude speedups when
sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN).
Furthermore, we find that learning the auxiliary representation is crucial and
outperforms predefined auxiliary representations such as an RGB pyramid
significantly.Comment: Updated preprocessing and Table 1, see A.1 in supplementary. Code and
models: https://github.com/fab-jul/L3C-PyTorc
Language Modeling Is Compression
It has long been established that predictive models can be transformed into
lossless compressors and vice versa. Incidentally, in recent years, the machine
learning community has focused on training increasingly large and powerful
self-supervised (language) models. Since these large language models exhibit
impressive predictive capabilities, they are well-positioned to be strong
compressors. In this work, we advocate for viewing the prediction problem
through the lens of compression and evaluate the compression capabilities of
large (foundation) models. We show that large language models are powerful
general-purpose predictors and that the compression viewpoint provides novel
insights into scaling laws, tokenization, and in-context learning. For example,
Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to
43.4% and LibriSpeech samples to 16.4% of their raw size, beating
domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively.
Finally, we show that the prediction-compression equivalence allows us to use
any compressor (like gzip) to build a conditional generative model
Leveraging progressive model and overfitting for efficient learned image compression
Deep learning is overwhelmingly dominant in the field of computer vision and
image/video processing for the last decade. However, for image and video
compression, it lags behind the traditional techniques based on discrete cosine
transform (DCT) and linear filters. Built on top of an autoencoder
architecture, learned image compression (LIC) systems have drawn enormous
attention in recent years. Nevertheless, the proposed LIC systems are still
inferior to the state-of-the-art traditional techniques, for example, the
Versatile Video Coding (VVC/H.266) standard, due to either their compression
performance or decoding complexity. Although claimed to outperform the
VVC/H.266 on a limited bit rate range, some proposed LIC systems take over 40
seconds to decode a 2K image on a GPU system. In this paper, we introduce a
powerful and flexible LIC framework with multi-scale progressive (MSP)
probability model and latent representation overfitting (LOF) technique. With
different predefined profiles, the proposed framework can achieve various
balance points between compression efficiency and computational complexity.
Experiments show that the proposed framework achieves 2.5%, 1.0%, and 1.3%
Bjontegaard delta bit rate (BD-rate) reduction over the VVC/H.266 standard on
three benchmark datasets on a wide bit rate range. More importantly, the
decoding complexity is reduced from O(n) to O(1) compared to many other LIC
systems, resulting in over 20 times speedup when decoding 2K images
Inter-Frame Video Compression based on Adaptive Fuzzy Inference System Compression of Multiple Frame Characteristics
Video compression is used for storage or bandwidth efficiency in clip video information. Video compression involves encoders and decoders. Video compression uses intra-frame, inter-frame, and block-based methods. Video compression compresses nearby frame pairs into one compressed frame using inter-frame compression. This study defines odd and even neighboring frame pairings. Motion estimation, compensation, and frame difference underpin video compression methods. In this study, adaptive FIS (Fuzzy Inference System) compresses and decompresses each odd-even frame pair. First, adaptive FIS trained on all feature pairings of each odd-even frame pair. Video compression-decompression uses the taught adaptive FIS as a codec. The features utilized are "mean", "std (standard deviation)", "mad (mean absolute deviation)", and "mean (std)". This study uses all video frames' average DCT (Discrete Cosine Transform) components as a quality parameter. The adaptive FIS training feature and amount of odd-even frame pairings affect compression ratio variation. The proposed approach achieves CR=25.39% and P=80.13%. "Mean" performs best overall (P=87.15%). "Mean (mad)" has the best compression ratio (CR=24.68%) for storage efficiency. The "std" feature compresses the video without decompression since it has the lowest quality change (Q_dct=10.39%)
- …