508 research outputs found
Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set
We introduce a video compression algorithm based on instance-adaptive
learning. On each video sequence to be transmitted, we finetune a pretrained
compression model. The optimal parameters are transmitted to the receiver along
with the latent code. By entropy-coding the parameter updates under a suitable
mixture model prior, we ensure that the network parameters can be encoded
efficiently. This instance-adaptive compression algorithm is agnostic about the
choice of base model and has the potential to improve any neural video codec.
On UVG, HEVC, and Xiph datasets, our codec improves the performance of a
scale-space flow model by between 21% and 27% BD-rate savings, and that of a
state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also
demonstrate that instance-adaptive finetuning improves the robustness to domain
shift. Finally, our approach reduces the capacity requirements of compression
models. We show that it enables a competitive performance even after reducing
the network size by 70%.Comment: Matches version published in TML
Overfitting for Fun and Profit: Instance-Adaptive Data Compression
Neural data compression has been shown to outperform classical methods in
terms of performance, with results still improving rapidly. At a high
level, neural compression is based on an autoencoder that tries to reconstruct
the input instance from a (quantized) latent representation, coupled with a
prior that is used to losslessly compress these latents. Due to limitations on
model capacity and imperfect optimization and generalization, such models will
suboptimally compress test data in general. However, one of the great strengths
of learned compression is that if the test-time data distribution is known and
relatively low-entropy (e.g. a camera watching a static scene, a dash cam in an
autonomous car, etc.), the model can easily be finetuned or adapted to this
distribution, leading to improved performance. In this paper we take this
concept to the extreme, adapting the full model to a single video, and sending
model updates (quantized and compressed using a parameter-space prior) along
with the latent representation. Unlike previous work, we finetune not only the
encoder/latents but the entire model, and - during finetuning - take into
account both the effect of model quantization and the additional costs incurred
by sending the model updates. We evaluate an image compression model on
I-frames (sampled at 2 fps) from videos of the Xiph dataset, and demonstrate
that full-model adaptation improves performance by ~1 dB, with respect to
encoder-only finetuning.Comment: Accepted at International Conference on Learning Representations 202
Content Adaptive NN-Based In-Loop Filter for VVC
The most recent video coding standard VVC contains five in-loop filters to reduce compression artifacts that come from the common drawbacks of block-based hybrid compression framework. However, those traditional in-loop filters are insufficient to deal with the complicated compression artifacts. The emergence of Neural Networks (NNs) has brought significant advancements in the realm of image and video processing, offering a promising avenue for improving video compression. Many prior studies in this domain have focused on training models on large datasets to achieve generalization, rather than catering to specific content characteristics. In this work, we introduced a content-adaptive in-loop filter for Versatile Video Coding (VVC) working with other in-loop filters. The content adaptation is achieved by over-fitting a pre-trained model at the encoder side on the test data. To reduce the bitrate overhead, the Neural Network Compression and Representation (NNR) standard has been introduced which focuses on compressing NNs efficiently. Furthermore, rather than over-fitting all parameters within the NN model, we introduce a set of learnable parameters known as multipliers, which serve to further reduce the bitrate overhead. The proposed model takes auxiliary information including Boundary Strength (BS) and Quantization parameter (QP) as input. Additionally, we have conducted a comprehensive series of experiments to identify the optimal combination of hyperparameters for this approach. The results indicate coding gains of -2.07% (Y), -5.54% (Cb), -1.95% (Cr) Bjøntegaard Delta rate (BD-rate) for Class B and -1.34% (Y), -1.88% (Cb), -0.52% (Cr) Bjøntegaard Delta rate (BD-rate) for Class D with respect to the Peak Signal-to-Noise Ration (PSNR) on top of the Versatile Video Coding (VVC) Test Model (VVC) 12.0 with NN-based Video Coding (NNVC) 5.0, in Random Access (RA) configuration
SLSNet: Skin lesion segmentation using a lightweight generativeadversarial network
The determination of precise skin lesion boundaries in dermoscopic images using automated methods faces many challenges, most importantly, the presence of hair, inconspicuous lesion edges and low contrast in dermoscopic images, and variability in the color, texture and shapes of skin lesions. Existing deep learning-based skin lesion segmentation algorithms are expensive in terms of computational time and memory. Consequently, running such segmentation algorithms requires a powerful GPU and high bandwidth memory, which are not available in dermoscopy devices. Thus, this article aims to achieve precise skin lesion segmentation with minimum resources: a lightweight, efficient generative adversarial network (GAN) model called SLSNet, which combines 1-D kernel factorized networks, position and channel attention, and multiscale aggregation mechanisms with a GAN model. The 1-D kernel factorized network reduces the computational cost of 2D filtering. The position and channel attention modules enhance the discriminative ability between the lesion and non-lesion feature representations in spatial and channel dimensions, respectively. A multiscale block is also used to aggregate the coarse-to-fine features of input skin images and reduce the effect of the artifacts. SLSNet is evaluated on two publicly available datasets: ISBI 2017 and the ISIC 2018. Although SLSNet has only 2.35 million parameters, the experimental results demonstrate that it achieves segmentation results on a par with the state-of-the-art skin lesion segmentation methods with an accuracy of 97.61%, and Dice and Jaccard similarity coefficients of 90.63% and 81.98%, respectively. SLSNet can run at more than 110 frames per second (FPS) in a single GTX1080Ti GPU, which is faster than well-known deep learning-based image segmentation models, such as FCN. Therefore, SLSNet can be used for practical dermoscopic applications
Motion Compensated Self Supervised Deep Learning for Highly Accelerated 3D Ultrashort Echo Time Pulmonary MRI
Purpose: To investigate motion compensated, self-supervised, model based deep
learning (MBDL) as a method to reconstruct free breathing, 3D Pulmonary
ultrashort echo time (UTE) acquisitions.
Theory and Methods: A self-supervised eXtra Dimension MBDL architecture
(XD-MBDL) was developed that combined respiratory states to reconstruct a
single high-quality 3D image. Non-rigid, GPU based motion fields were
incorporated into this architecture by estimating motion fields from a low
resolution motion resolved (XD-GRASP) iterative reconstruction. Motion
Compensated XD-MBDL was evaluated on lung UTE datasets with and without
contrast and was compared to constrained reconstructions and variants of
self-supervised MBDL that do not consider respiratory motion.
Results: Images reconstructed using XD-MBDL demonstrate improved image
quality as measured by apparent SNR, CNR and visual assessment relative to
self-supervised MBDL approaches that do not account for dynamic respiratory
states, XD-GRASP and a recently proposed motion compensated iterative
reconstruction strategy (iMoCo). Additionally, XD-MBDL reduced reconstruction
time relative to both XD-GRASP and iMoCo.
Conclusion: A method was developed to allow self-supervised MBDL to combine
multiple respiratory states to reconstruct a single image. This method was
combined with GPU-based image registration to further improve reconstruction
quality. This approach showed promising results reconstructing a user-selected
respiratory phase from free breathing 3D pulmonary UTE acquisitions
- …