124 research outputs found
Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set
We introduce a video compression algorithm based on instance-adaptive
learning. On each video sequence to be transmitted, we finetune a pretrained
compression model. The optimal parameters are transmitted to the receiver along
with the latent code. By entropy-coding the parameter updates under a suitable
mixture model prior, we ensure that the network parameters can be encoded
efficiently. This instance-adaptive compression algorithm is agnostic about the
choice of base model and has the potential to improve any neural video codec.
On UVG, HEVC, and Xiph datasets, our codec improves the performance of a
scale-space flow model by between 21% and 27% BD-rate savings, and that of a
state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also
demonstrate that instance-adaptive finetuning improves the robustness to domain
shift. Finally, our approach reduces the capacity requirements of compression
models. We show that it enables a competitive performance even after reducing
the network size by 70%.Comment: Matches version published in TML
Practical Full Resolution Learned Lossless Image Compression
We propose the first practical learned lossless image compression system,
L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and
JPEG 2000. At the core of our method is a fully parallelizable hierarchical
probabilistic model for adaptive entropy coding which is optimized end-to-end
for the compression task. In contrast to recent autoregressive discrete
probabilistic models such as PixelCNN, our method i) models the image
distribution jointly with learned auxiliary representations instead of
exclusively modeling the image distribution in RGB space, and ii) only requires
three forward-passes to predict all pixel probabilities instead of one for each
pixel. As a result, L3C obtains over two orders of magnitude speedups when
sampling compared to the fastest PixelCNN variant (Multiscale-PixelCNN).
Furthermore, we find that learning the auxiliary representation is crucial and
outperforms predefined auxiliary representations such as an RGB pyramid
significantly.Comment: Updated preprocessing and Table 1, see A.1 in supplementary. Code and
models: https://github.com/fab-jul/L3C-PyTorc
HNeRV: A Hybrid Neural Representation for Videos
Implicit neural representations store videos as neural networks and have
performed well for various vision tasks such as video compression and
denoising. With frame index or positional index as input, implicit
representations (NeRV, E-NeRV, \etc) reconstruct video from fixed and
content-agnostic embeddings. Such embedding largely limits the regression
capacity and internal generalization for video interpolation. In this paper, we
propose a Hybrid Neural Representation for Videos (HNeRV), where a learnable
encoder generates content-adaptive embeddings, which act as the decoder input.
Besides the input embedding, we introduce HNeRV blocks, which ensure model
parameters are evenly distributed across the entire network, such that higher
layers (layers near the output) can have more capacity to store high-resolution
content and video details. With content-adaptive embeddings and re-designed
architecture, HNeRV outperforms implicit methods in video regression tasks for
both reconstruction quality ( PSNR) and convergence speed (
faster), and shows better internal generalization. As a simple and efficient
video representation, HNeRV also shows decoding advantages for speed,
flexibility, and deployment, compared to traditional codecs~(H.264, H.265) and
learning-based compression methods. Finally, we explore the effectiveness of
HNeRV on downstream tasks such as video compression and video inpainting. We
provide project page at https://haochen-rye.github.io/HNeRV, and Code at
https://github.com/haochen-rye/HNeRVComment: CVPR 2023. Project page at https://haochen-rye.github.io/HNeRV, and
Code at https://github.com/haochen-rye/HNeR
Building the big picture : enhanced resolution from coding
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (leaves 105-107).by Roger George Kermode.M.S
Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression
The latest advancements in neural image compression show great potential in
surpassing the rate-distortion performance of conventional standard codecs.
Nevertheless, there exists an indelible domain gap between the datasets
utilized for training (i.e., natural images) and those utilized for inference
(e.g., artistic images). Our proposal involves a low-rank adaptation approach
aimed at addressing the rate-distortion drop observed in out-of-domain
datasets. Specifically, we perform low-rank matrix decomposition to update
certain adaptation parameters of the client's decoder. These updated
parameters, along with image latents, are encoded into a bitstream and
transmitted to the decoder in practical scenarios. Due to the low-rank
constraint imposed on the adaptation parameters, the resulting bit rate
overhead is small. Furthermore, the bit rate allocation of low-rank adaptation
is \emph{non-trivial}, considering the diverse inputs require varying
adaptation bitstreams. We thus introduce a dynamic gating network on top of the
low-rank adaptation method, in order to decide which decoder layer should
employ adaptation. The dynamic adaptation network is optimized end-to-end using
rate-distortion loss. Our proposed method exhibits universality across diverse
image datasets. Extensive results demonstrate that this paradigm significantly
mitigates the domain gap, surpassing non-adaptive methods with an average
BD-rate improvement of approximately across out-of-domain images.
Furthermore, it outperforms the most advanced instance adaptive methods by
roughly BD-rate. Ablation studies confirm our method's ability to
universally enhance various image compression architectures.Comment: Accepted by ACM MM 2023, 13 pages, 12 figure
- …