5 research outputs found
Bridging the gap between image coding for machines and humans
Image coding for machines (ICM) aims at reducing the bitrate required to
represent an image while minimizing the drop in machine vision analysis
accuracy. In many use cases, such as surveillance, it is also important that
the visual quality is not drastically deteriorated by the compression process.
Recent works on using neural network (NN) based ICM codecs have shown
significant coding gains against traditional methods; however, the decompressed
images, especially at low bitrates, often contain checkerboard artifacts. We
propose an effective decoder finetuning scheme based on adversarial training to
significantly enhance the visual quality of ICM codecs, while preserving the
machine analysis accuracy, without adding extra bitcost or parameters at the
inference phase. The results show complete removal of the checkerboard
artifacts at the negligible cost of -1.6% relative change in task performance
score. In the cases where some amount of artifacts is tolerable, such as when
machine consumption is the primary target, this technique can enhance both
pixel-fidelity and feature-fidelity scores without losing task performance
Overfitting NN loop-filters in video coding
Overfitting is usually regarded as a negative condition since it impairs the generalisation power of a model. Nevertheless, overfitting a Neural Network (NN) on test data may be advantageous to improve the compression efficiency of image/video coding tools and systems. Previous research has demonstrated the benefits of NN overfitting for post-processing operations, i.e. post-filters, but not yet for actual decoding tools. Generally, the NN is overfitted on test data at the encoder end, and the weight update is coded and sent to the decoder end along the image/video bitstream. The proposed approach follows this strategy. In particular, the overfitting of the Low Operation Point (LOP) loop-filter in NN-based Video Coding (NNVC) software is studied. The overall approach yields Bjontegaard Delta rate (BD-rate) of -7.74%, -13.73% and -12.49%, for the Y, U and V components, respectively. Out of these coding gains, 1.21%, 6.43% and 5.52%, for the Y, U and V components, are attributed to the overfitting. The boost in the coding gains comes with only 1.5% more complexity, due to the multiplier parameters introduced during the overfitting.Peer reviewe
Adaptation and Attention for Neural Video Coding
Neural image coding represents now the state-of-The-Art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-To-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-Attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-The-Art traditional and end-To-end learned codecs, respectively, and to the top performing end-To-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings.acceptedVersionPeer reviewe
Adaptation and Attention for Neural Video Coding
Neural image coding represents now the state-of-The-Art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-To-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-Attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-The-Art traditional and end-To-end learned codecs, respectively, and to the top performing end-To-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings.acceptedVersionPeer reviewe
Learned Enhancement Filters for Image Coding for Machines
Machine-To-Machine (M2M) communication applications and use cases, such as object detection and instance segmentation, are becoming mainstream nowadays. As a consequence, majority of multimedia content is likely to be consumed by machines in the coming years. This opens up new challenges on efficient compression of this type of data. Two main directions are being explored in the literature, one being based on existing traditional codecs, such as the Versatile Video Coding (VVC) standard, that are optimized for human-Targeted use cases, and another based on end-To-end trained neural networks. However, traditional codecs have significant benefits in terms of interoperability, real-Time decoding, and availability of hardware implementations over end-To-end learned codecs. Therefore, in this paper, we propose learned post-processing filters that are targeted for enhancing the performance of machine vision tasks for images reconstructed by the VVC codec. The proposed enhancement filters provide significant improvements on the target tasks compared to VVC coded images. The conducted experiments show that the proposed post-processing filters provide about 45% and 49% Bjontegaard Delta Rate gains over VVC in instance segmentation and object detection tasks, respectively.acceptedVersionPeer reviewe