Search CORE

3,167 research outputs found

Spatial-Temporal Residue Network Based In-Loop Filter for Video Coding

Author: Jia Chuanmin
Ma Siwei
Wang Shanshe
Wang Shiqi
Zhang Xinfeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/09/2017
Field of study

Deep learning has demonstrated tremendous break through in the area of image/video processing. In this paper, a spatial-temporal residue network (STResNet) based in-loop filter is proposed to suppress visual artifacts such as blocking, ringing in video coding. Specifically, the spatial and temporal information is jointly exploited by taking both current block and co-located block in reference frame into consideration during the processing of in-loop filter. The architecture of STResNet only consists of four convolution layers which shows hospitality to memory and coding complexity. Moreover, to fully adapt the input content and improve the performance of the proposed in-loop filter, coding tree unit (CTU) level control flag is applied in the sense of rate-distortion optimization. Extensive experimental results show that our scheme provides up to 5.1% bit-rate reduction compared to the state-of-the-art video coding standard.Comment: 4 pages, 2 figures, accepted by VCIP201

arXiv.org e-Print Archive

Texture Segmentation Based Video Compression Using Convolutional Neural Networks

Author: Chen Di
Delp Edward J.
Fu Chichen
Liu Zoe
Zhu Fengqing
Publication venue
Publication date: 08/02/2018
Field of study

There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality

arXiv.org e-Print Archive

Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction

Author: Liu Jiaying
Xia Sifeng
Yang Wenhan
Publication venue
Publication date: 16/05/2019
Field of study

Inter prediction is an important module in video coding for temporal redundancy removal, where similar reference blocks are searched from previously coded frames and employed to predict the block to be coded. Although traditional video codecs can estimate and compensate for block-level motions, their inter prediction performance is still heavily affected by the remaining inconsistent pixel-wise displacement caused by irregular rotation and deformation. In this paper, we address the problem by proposing a deep frame interpolation network to generate additional reference frames in coding scenarios. First, we summarize the previous adaptive convolutions used for frame interpolation and propose a factorized kernel convolutional network to improve the modeling capacity and simultaneously keep its compact form. Second, to better train this network, multi-domain hierarchical constraints are introduced to regularize the training of our factorized kernel convolutional network. For spatial domain, we use a gradually down-sampled and up-sampled auto-encoder to generate the factorized kernels for frame interpolation at different scales. For quality domain, considering the inconsistent quality of the input frames, the factorized kernel convolution is modulated with quality-related features to learn to exploit more information from high quality frames. For frequency domain, a sum of absolute transformed difference loss that performs frequency transformation is utilized to facilitate network optimization from the view of coding performance. With the well-designed frame interpolation network regularized by multi-domain hierarchical constraints, our method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate saving for the luma component under the random access configuration

arXiv.org e-Print Archive

Deep Learning-Based Video Coding: A Review and A Case Study

Author: Li Houqiang
Li Yue
Lin Jianping
Liu Dong
Wu Feng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2019
Field of study

The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works about using deep learning for image/video coding, which has been an actively developing research area since the year of 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks (deep schemes), and deep network-based coding tools (deep tools) that shall be used within traditional coding schemes or together with traditional coding tools. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding scheme and transform coding scheme, respectively. For deep tools, there have been several proposed techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, namely Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF) and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help improve the compression efficiency by a significant margin. With the two deep tools as well as other non-deep coding tools, DLVC is able to achieve on average 39.6\% and 33.0\% bits saving than HEVC, under random-access and low-delay configurations, respectively. The source code of DLVC has been released for future researches

arXiv.org e-Print Archive

Deep Predictive Video Compression with Bi-directional Prediction

Author: Kim Munchurl
Park Woonsung
Publication venue
Publication date: 05/04/2019
Field of study

Recently, deep image compression has shown a big progress in terms of coding efficiency and image quality improvement. However, relatively less attention has been put on video compression using deep learning networks. In the paper, we first propose a deep learning based bi-predictive coding network, called BP-DVC Net, for video compression. Learned from the lesson of the conventional video coding, a B-frame coding structure is incorporated in our BP-DVC Net. While the bi-predictive coding in the conventional video codecs requires to transmit to decoder sides the motion vectors for block motion and the residues from prediction, our BP-DVC Net incorporates optical flow estimation networks in both encoder and decoder sides so as not to transmit the motion information to the decoder sides for coding efficiency improvement. Also, a bi-prediction network in the BP-DVC Net is proposed and used to precisely predict the current frame and to yield the resulting residues as small as possible. Furthermore, our BP-DVC Net allows for the compressive feature maps to be entropy-coded using the temporal context among the feature maps of adjacent frames. The BP-DVC Net has an end-to-end video compression architecture with newly designed flow and prediction losses. Experimental results show that the compression performance of our proposed method is comparable to those of H.264, HEVC in terms of PSNR and MS-SSIM

arXiv.org e-Print Archive

Combining Progressive Rethinking and Collaborative Learning: A Deep Framework for In-Loop Filtering

Author: Liu Jiaying
Wang Dezhao
Xia Sifeng
Yang Wenhan
Publication venue
Publication date: 31/03/2021
Field of study

In this paper, we aim to address issues of (1) joint spatial-temporal modeling and (2) side information injection for deep-learning based in-loop filter. For (1), we design a deep network with both progressive rethinking and collaborative learning mechanisms to improve quality of the reconstructed intra-frames and inter-frames, respectively. For intra coding, a Progressive Rethinking Network (PRN) is designed to simulate the human decision mechanism for effective spatial modeling. Our designed block introduces an additional inter-block connection to bypass a high-dimensional informative feature before the bottleneck module across blocks to review the complete past memorized experiences and rethinks progressively. For inter coding, the current reconstructed frame interacts with reference frames (peak quality frame and the nearest adjacent frame) collaboratively at the feature level. For (2), we extract both intra-frame and inter-frame side information for better context modeling. A coarse-to-fine partition map based on HEVC partition trees is built as the intra-frame side information. Furthermore, the warped features of the reference frames are offered as the inter-frame side information. Our PRN with intra-frame side information provides 9.0% BD-rate reduction on average compared to HEVC baseline under All-intra (AI) configuration. While under Low-Delay B (LDB), Low-Delay P (LDP) and Random Access (RA) configuration, our PRN with inter-frame side information provides 9.0%, 10.6% and 8.0% BD-rate reduction on average respectively. Our project webpage is https://dezhao-wang.github.io/PRN-v2/.Comment: Accepted for publication in IEEE Transactions on Image Processing (TIP). Website available at https://dezhao-wang.github.io/PRN-v2

arXiv.org e-Print Archive

Towards Modality Transferable Visual Information Representation with Optimal Model Compression

Author: Kwong Sam
Lin Rongqun
Wang Shiqi
Zhu Linwei
Publication venue
Publication date: 12/08/2020
Field of study

Compactly representing the visual signals is of fundamental importance in various image/video-centered applications. Although numerous approaches were developed for improving the image and video coding performance by removing the redundancies within visual signals, much less work has been dedicated to the transformation of the visual signals to another well-established modality for better representation capability. In this paper, we propose a new scheme for visual signal representation that leverages the philosophy of transferable modality. In particular, the deep learning model, which characterizes and absorbs the statistics of the input scene with online training, could be efficiently represented in the sense of rate-utility optimization to serve as the enhancement layer in the bitstream. As such, the overall performance can be further guaranteed by optimizing the new modality incorporated. The proposed framework is implemented on the state-of-the-art video coding standard (i.e., versatile video coding), and significantly better representation capability has been observed based on extensive evaluations.Comment: Accepted in ACM Multimedia 202

arXiv.org e-Print Archive

Residue guided loop filter for HEVC post processing

Author: Jia Wei
Li Li
Li Zhu
Liu Shan
Publication venue
Publication date: 29/07/2019
Field of study

The block-based coding structure in the hybrid coding framework gives rise to the obvious artifacts such as blocking, ringing .etc. Recently, some Convolutional Neural Network (CNN) based works apply reconstruction as the only input to reduce the artifacts. Though the performance of these works relying on powerful learning ability surpasses traditional loop-filter based methods in the High Efficiency Video Coding (HEVC) standard, how to enhance the high frequency signal is still not addressed. In addition to reconstruction, we first propose using the residue as the other input of our CNN-based loop filter. In essence, the residual signal as a high frequency indicator guides the CNN to augment the high frequency signal such as sharp shape and edge information. Second, we find out that the reconstruction and residue signals have different characteristics and should be handled with different network structures. For the reconstruction, we develop an All Frequency (reconstruction) CNN (AF-CNN) adopting the down sampling and up sampling pairs to learn all frequency signal with the global information. For the residue, we devise a High Frequency (residual) CNN (HF-CNN) customizing the Residual Blocks to adapt to the high frequency signal information. To the best of our knowledge, this is the first work that employs residual signal as a vital independent high frequency input to direct the learning of CNN- based loop filtering. We implement the proposed algorithms in the HEVC reference software. The experimental results show that our proposed approach of dual inputs of Residual and Reconstruction with HF-CNN and AF-CNN respectively (RRHA) presents significant BD-rate savings compared with the current CNN-based scheme

arXiv.org e-Print Archive

MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video

Author: Guan Zhenyu
Liu Tie
Wang Zulin
Xing Qunliang
Xu Mai
Yang Ren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/10/2020
Field of study

The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, not considering the similarity between consecutive frames. Since heavy fluctuation exists across compressed video frames as investigated in this paper, frame similarity can be utilized for quality enhancement of low-quality frames given their neighboring high-quality frames. This task is Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as the first attempt in this direction. In our approach, we firstly develop a Bidirectional Long Short-Term Memory (BiLSTM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are the input. In MF-CNN, motion between the non-PQF and PQFs is compensated by a motion compensation subnet. Subsequently, a quality enhancement subnet fuses the non-PQF and compensated PQFs, and then reduces the compression artifacts of the non-PQF. Also, PQF quality is enhanced in the same way. Finally, experiments validate the effectiveness and generalization ability of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code is available at https://github.com/RyanXingQL/MFQEv2.0.git.Comment: Accepted to TPAMI in September, 2019. v6 updates: correct units in Fig. 11; correct author info; delete bio photos. arXiv admin note: text overlap with arXiv:1803.0468

arXiv.org e-Print Archive

A Convolutional Neural Network-Based Low Complexity Filter

Author: Fan Yibo
Katto Jiro
Liu Chao
Sun Heming
Zeng Xiaoyang
Publication venue
Publication date: 06/09/2020
Field of study

Convolutional Neural Network (CNN)-based filters have achieved significant performance in video artifacts reduction. However, the high complexity of existing methods makes it difficult to be applied in real usage. In this paper, a CNN-based low complexity filter is proposed. We utilize depth separable convolution (DSC) merged with the batch normalization (BN) as the backbone of our proposed CNN-based network. Besides, a weight initialization method is proposed to enhance the training performance. To solve the well known over smoothing problem for the inter frames, a frame-level residual mapping (RM) is presented. We analyze some of the mainstream methods like frame-level and block-level based filters quantitatively and build our CNN-based filter with frame-level control to avoid the extra complexity and artificial boundaries caused by block-level control. In addition, a novel module called RM is designed to restore the distortion from the learned residuals. As a result, we can effectively improve the generalization ability of the learning-based filter and reach an adaptive filtering effect. Moreover, this module is flexible and can be combined with other learning-based filters. The experimental results show that our proposed method achieves significant BD-rate reduction than H.265/HEVC. It achieves about 1.2% BD-rate reduction and 79.1% decrease in FLOPs than VR-CNN. Finally, the measurement on H.266/VVC and ablation studies are also conducted to ensure the effectiveness of the proposed method

arXiv.org e-Print Archive