Neural data compression has been shown to outperform classical methods in
terms of RD performance, with results still improving rapidly. At a high
level, neural compression is based on an autoencoder that tries to reconstruct
the input instance from a (quantized) latent representation, coupled with a
prior that is used to losslessly compress these latents. Due to limitations on
model capacity and imperfect optimization and generalization, such models will
suboptimally compress test data in general. However, one of the great strengths
of learned compression is that if the test-time data distribution is known and
relatively low-entropy (e.g. a camera watching a static scene, a dash cam in an
autonomous car, etc.), the model can easily be finetuned or adapted to this
distribution, leading to improved RD performance. In this paper we take this
concept to the extreme, adapting the full model to a single video, and sending
model updates (quantized and compressed using a parameter-space prior) along
with the latent representation. Unlike previous work, we finetune not only the
encoder/latents but the entire model, and - during finetuning - take into
account both the effect of model quantization and the additional costs incurred
by sending the model updates. We evaluate an image compression model on
I-frames (sampled at 2 fps) from videos of the Xiph dataset, and demonstrate
that full-model adaptation improves RD performance by ~1 dB, with respect to
encoder-only finetuning.Comment: Accepted at International Conference on Learning Representations 202