508 research outputs found

    Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

    Full text link
    We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This instance-adaptive compression algorithm is agnostic about the choice of base model and has the potential to improve any neural video codec. On UVG, HEVC, and Xiph datasets, our codec improves the performance of a scale-space flow model by between 21% and 27% BD-rate savings, and that of a state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also demonstrate that instance-adaptive finetuning improves the robustness to domain shift. Finally, our approach reduces the capacity requirements of compression models. We show that it enables a competitive performance even after reducing the network size by 70%.Comment: Matches version published in TML

    Overfitting for Fun and Profit: Instance-Adaptive Data Compression

    Get PDF
    Neural data compression has been shown to outperform classical methods in terms of RDRD performance, with results still improving rapidly. At a high level, neural compression is based on an autoencoder that tries to reconstruct the input instance from a (quantized) latent representation, coupled with a prior that is used to losslessly compress these latents. Due to limitations on model capacity and imperfect optimization and generalization, such models will suboptimally compress test data in general. However, one of the great strengths of learned compression is that if the test-time data distribution is known and relatively low-entropy (e.g. a camera watching a static scene, a dash cam in an autonomous car, etc.), the model can easily be finetuned or adapted to this distribution, leading to improved RDRD performance. In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates (quantized and compressed using a parameter-space prior) along with the latent representation. Unlike previous work, we finetune not only the encoder/latents but the entire model, and - during finetuning - take into account both the effect of model quantization and the additional costs incurred by sending the model updates. We evaluate an image compression model on I-frames (sampled at 2 fps) from videos of the Xiph dataset, and demonstrate that full-model adaptation improves RDRD performance by ~1 dB, with respect to encoder-only finetuning.Comment: Accepted at International Conference on Learning Representations 202

    Content Adaptive NN-Based In-Loop Filter for VVC

    Get PDF
    The most recent video coding standard VVC contains five in-loop filters to reduce compression artifacts that come from the common drawbacks of block-based hybrid compression framework. However, those traditional in-loop filters are insufficient to deal with the complicated compression artifacts. The emergence of Neural Networks (NNs) has brought significant advancements in the realm of image and video processing, offering a promising avenue for improving video compression. Many prior studies in this domain have focused on training models on large datasets to achieve generalization, rather than catering to specific content characteristics. In this work, we introduced a content-adaptive in-loop filter for Versatile Video Coding (VVC) working with other in-loop filters. The content adaptation is achieved by over-fitting a pre-trained model at the encoder side on the test data. To reduce the bitrate overhead, the Neural Network Compression and Representation (NNR) standard has been introduced which focuses on compressing NNs efficiently. Furthermore, rather than over-fitting all parameters within the NN model, we introduce a set of learnable parameters known as multipliers, which serve to further reduce the bitrate overhead. The proposed model takes auxiliary information including Boundary Strength (BS) and Quantization parameter (QP) as input. Additionally, we have conducted a comprehensive series of experiments to identify the optimal combination of hyperparameters for this approach. The results indicate coding gains of -2.07% (Y), -5.54% (Cb), -1.95% (Cr) Bjøntegaard Delta rate (BD-rate) for Class B and -1.34% (Y), -1.88% (Cb), -0.52% (Cr) Bjøntegaard Delta rate (BD-rate) for Class D with respect to the Peak Signal-to-Noise Ration (PSNR) on top of the Versatile Video Coding (VVC) Test Model (VVC) 12.0 with NN-based Video Coding (NNVC) 5.0, in Random Access (RA) configuration

    SLSNet: Skin lesion segmentation using a lightweight generativeadversarial network

    Get PDF
    The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications

    Motion Compensated Self Supervised Deep Learning for Highly Accelerated 3D Ultrashort Echo Time Pulmonary MRI

    Full text link
    Purpose: To investigate motion compensated, self-supervised, model based deep learning (MBDL) as a method to reconstruct free breathing, 3D Pulmonary ultrashort echo time (UTE) acquisitions. Theory and Methods: A self-supervised eXtra Dimension MBDL architecture (XD-MBDL) was developed that combined respiratory states to reconstruct a single high-quality 3D image. Non-rigid, GPU based motion fields were incorporated into this architecture by estimating motion fields from a low resolution motion resolved (XD-GRASP) iterative reconstruction. Motion Compensated XD-MBDL was evaluated on lung UTE datasets with and without contrast and was compared to constrained reconstructions and variants of self-supervised MBDL that do not consider respiratory motion. Results: Images reconstructed using XD-MBDL demonstrate improved image quality as measured by apparent SNR, CNR and visual assessment relative to self-supervised MBDL approaches that do not account for dynamic respiratory states, XD-GRASP and a recently proposed motion compensated iterative reconstruction strategy (iMoCo). Additionally, XD-MBDL reduced reconstruction time relative to both XD-GRASP and iMoCo. Conclusion: A method was developed to allow self-supervised MBDL to combine multiple respiratory states to reconstruct a single image. This method was combined with GPU-based image registration to further improve reconstruction quality. This approach showed promising results reconstructing a user-selected respiratory phase from free breathing 3D pulmonary UTE acquisitions
    corecore