399 research outputs found

    Practical Full Resolution Learned Lossless Image Compression

    Full text link
    We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000. At the core of our method is a fully parallelizable hierarchical probabilistic model for adaptive entropy coding which is optimized end-to-end for the compression task. In contrast to recent autoregressive discrete probabilistic models such as PixelCNN, our method i) models the image distribution jointly with learned auxiliary representations instead of exclusively modeling the image distribution in RGB space, and ii) only requires three forward-passes to predict all pixel probabilities instead of one for each pixel. As a result, L3C obtains over two orders of magnitude speedups when sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN). Furthermore, we find that learning the auxiliary representation is crucial and outperforms predefined auxiliary representations such as an RGB pyramid significantly.Comment: Updated preprocessing and Table 1, see A.1 in supplementary. Code and models: https://github.com/fab-jul/L3C-PyTorc

    Language Modeling Is Compression

    Full text link
    It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model

    Leveraging progressive model and overfitting for efficient learned image compression

    Full text link
    Deep learning is overwhelmingly dominant in the field of computer vision and image/video processing for the last decade. However, for image and video compression, it lags behind the traditional techniques based on discrete cosine transform (DCT) and linear filters. Built on top of an autoencoder architecture, learned image compression (LIC) systems have drawn enormous attention in recent years. Nevertheless, the proposed LIC systems are still inferior to the state-of-the-art traditional techniques, for example, the Versatile Video Coding (VVC/H.266) standard, due to either their compression performance or decoding complexity. Although claimed to outperform the VVC/H.266 on a limited bit rate range, some proposed LIC systems take over 40 seconds to decode a 2K image on a GPU system. In this paper, we introduce a powerful and flexible LIC framework with multi-scale progressive (MSP) probability model and latent representation overfitting (LOF) technique. With different predefined profiles, the proposed framework can achieve various balance points between compression efficiency and computational complexity. Experiments show that the proposed framework achieves 2.5%, 1.0%, and 1.3% Bjontegaard delta bit rate (BD-rate) reduction over the VVC/H.266 standard on three benchmark datasets on a wide bit rate range. More importantly, the decoding complexity is reduced from O(n) to O(1) compared to many other LIC systems, resulting in over 20 times speedup when decoding 2K images

    Inter-Frame Video Compression based on Adaptive Fuzzy Inference System Compression of Multiple Frame Characteristics

    Get PDF
    Video compression is used for storage or bandwidth efficiency in clip video information. Video compression involves encoders and decoders. Video compression uses intra-frame, inter-frame, and block-based methods.  Video compression compresses nearby frame pairs into one compressed frame using inter-frame compression. This study defines odd and even neighboring frame pairings. Motion estimation, compensation, and frame difference underpin video compression methods. In this study, adaptive FIS (Fuzzy Inference System) compresses and decompresses each odd-even frame pair. First, adaptive FIS trained on all feature pairings of each odd-even frame pair. Video compression-decompression uses the taught adaptive FIS as a codec. The features utilized are "mean", "std (standard deviation)", "mad (mean absolute deviation)", and "mean (std)". This study uses all video frames' average DCT (Discrete Cosine Transform) components as a quality parameter. The adaptive FIS training feature and amount of odd-even frame pairings affect compression ratio variation. The proposed approach achieves CR=25.39% and P=80.13%. "Mean" performs best overall (P=87.15%). "Mean (mad)" has the best compression ratio (CR=24.68%) for storage efficiency. The "std" feature compresses the video without decompression since it has the lowest quality change (Q_dct=10.39%)
    corecore