3,167 research outputs found
Spatial-Temporal Residue Network Based In-Loop Filter for Video Coding
Deep learning has demonstrated tremendous break through in the area of
image/video processing. In this paper, a spatial-temporal residue network
(STResNet) based in-loop filter is proposed to suppress visual artifacts such
as blocking, ringing in video coding. Specifically, the spatial and temporal
information is jointly exploited by taking both current block and co-located
block in reference frame into consideration during the processing of in-loop
filter. The architecture of STResNet only consists of four convolution layers
which shows hospitality to memory and coding complexity. Moreover, to fully
adapt the input content and improve the performance of the proposed in-loop
filter, coding tree unit (CTU) level control flag is applied in the sense of
rate-distortion optimization. Extensive experimental results show that our
scheme provides up to 5.1% bit-rate reduction compared to the state-of-the-art
video coding standard.Comment: 4 pages, 2 figures, accepted by VCIP201
Texture Segmentation Based Video Compression Using Convolutional Neural Networks
There has been a growing interest in using different approaches to improve
the coding efficiency of modern video codec in recent years as demand for
web-based video consumption increases. In this paper, we propose a model-based
approach that uses texture analysis/synthesis to reconstruct blocks in texture
regions of a video to achieve potential coding gains using the AV1 codec
developed by the Alliance for Open Media (AOM). The proposed method uses
convolutional neural networks to extract texture regions in a frame, which are
then reconstructed using a global motion model. Our preliminary results show an
increase in coding efficiency while maintaining satisfactory visual quality
Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction
Inter prediction is an important module in video coding for temporal
redundancy removal, where similar reference blocks are searched from previously
coded frames and employed to predict the block to be coded. Although
traditional video codecs can estimate and compensate for block-level motions,
their inter prediction performance is still heavily affected by the remaining
inconsistent pixel-wise displacement caused by irregular rotation and
deformation. In this paper, we address the problem by proposing a deep frame
interpolation network to generate additional reference frames in coding
scenarios. First, we summarize the previous adaptive convolutions used for
frame interpolation and propose a factorized kernel convolutional network to
improve the modeling capacity and simultaneously keep its compact form. Second,
to better train this network, multi-domain hierarchical constraints are
introduced to regularize the training of our factorized kernel convolutional
network. For spatial domain, we use a gradually down-sampled and up-sampled
auto-encoder to generate the factorized kernels for frame interpolation at
different scales. For quality domain, considering the inconsistent quality of
the input frames, the factorized kernel convolution is modulated with
quality-related features to learn to exploit more information from high quality
frames. For frequency domain, a sum of absolute transformed difference loss
that performs frequency transformation is utilized to facilitate network
optimization from the view of coding performance. With the well-designed frame
interpolation network regularized by multi-domain hierarchical constraints, our
method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate
saving for the luma component under the random access configuration
Deep Learning-Based Video Coding: A Review and A Case Study
The past decade has witnessed great success of deep learning technology in
many disciplines, especially in computer vision and image processing. However,
deep learning-based video coding remains in its infancy. This paper reviews the
representative works about using deep learning for image/video coding, which
has been an actively developing research area since the year of 2015. We divide
the related works into two categories: new coding schemes that are built
primarily upon deep networks (deep schemes), and deep network-based coding
tools (deep tools) that shall be used within traditional coding schemes or
together with traditional coding tools. For deep schemes, pixel probability
modeling and auto-encoder are the two approaches, that can be viewed as
predictive coding scheme and transform coding scheme, respectively. For deep
tools, there have been several proposed techniques using deep learning to
perform intra-picture prediction, inter-picture prediction, cross-channel
prediction, probability distribution prediction, transform, post- or in-loop
filtering, down- and up-sampling, as well as encoding optimizations. In the
hope of advocating the research of deep learning-based video coding, we present
a case study of our developed prototype video codec, namely Deep Learning Video
Coding (DLVC). DLVC features two deep tools that are both based on
convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF)
and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help
improve the compression efficiency by a significant margin. With the two deep
tools as well as other non-deep coding tools, DLVC is able to achieve on
average 39.6\% and 33.0\% bits saving than HEVC, under random-access and
low-delay configurations, respectively. The source code of DLVC has been
released for future researches
Deep Predictive Video Compression with Bi-directional Prediction
Recently, deep image compression has shown a big progress in terms of coding
efficiency and image quality improvement. However, relatively less attention
has been put on video compression using deep learning networks. In the paper,
we first propose a deep learning based bi-predictive coding network, called
BP-DVC Net, for video compression. Learned from the lesson of the conventional
video coding, a B-frame coding structure is incorporated in our BP-DVC Net.
While the bi-predictive coding in the conventional video codecs requires to
transmit to decoder sides the motion vectors for block motion and the residues
from prediction, our BP-DVC Net incorporates optical flow estimation networks
in both encoder and decoder sides so as not to transmit the motion information
to the decoder sides for coding efficiency improvement. Also, a bi-prediction
network in the BP-DVC Net is proposed and used to precisely predict the current
frame and to yield the resulting residues as small as possible. Furthermore,
our BP-DVC Net allows for the compressive feature maps to be entropy-coded
using the temporal context among the feature maps of adjacent frames. The
BP-DVC Net has an end-to-end video compression architecture with newly designed
flow and prediction losses. Experimental results show that the compression
performance of our proposed method is comparable to those of H.264, HEVC in
terms of PSNR and MS-SSIM
Combining Progressive Rethinking and Collaborative Learning: A Deep Framework for In-Loop Filtering
In this paper, we aim to address issues of (1) joint spatial-temporal
modeling and (2) side information injection for deep-learning based in-loop
filter. For (1), we design a deep network with both progressive rethinking and
collaborative learning mechanisms to improve quality of the reconstructed
intra-frames and inter-frames, respectively. For intra coding, a Progressive
Rethinking Network (PRN) is designed to simulate the human decision mechanism
for effective spatial modeling. Our designed block introduces an additional
inter-block connection to bypass a high-dimensional informative feature before
the bottleneck module across blocks to review the complete past memorized
experiences and rethinks progressively. For inter coding, the current
reconstructed frame interacts with reference frames (peak quality frame and the
nearest adjacent frame) collaboratively at the feature level. For (2), we
extract both intra-frame and inter-frame side information for better context
modeling. A coarse-to-fine partition map based on HEVC partition trees is built
as the intra-frame side information. Furthermore, the warped features of the
reference frames are offered as the inter-frame side information. Our PRN with
intra-frame side information provides 9.0% BD-rate reduction on average
compared to HEVC baseline under All-intra (AI) configuration. While under
Low-Delay B (LDB), Low-Delay P (LDP) and Random Access (RA) configuration, our
PRN with inter-frame side information provides 9.0%, 10.6% and 8.0% BD-rate
reduction on average respectively. Our project webpage is
https://dezhao-wang.github.io/PRN-v2/.Comment: Accepted for publication in IEEE Transactions on Image Processing
(TIP). Website available at https://dezhao-wang.github.io/PRN-v2
Towards Modality Transferable Visual Information Representation with Optimal Model Compression
Compactly representing the visual signals is of fundamental importance in
various image/video-centered applications. Although numerous approaches were
developed for improving the image and video coding performance by removing the
redundancies within visual signals, much less work has been dedicated to the
transformation of the visual signals to another well-established modality for
better representation capability. In this paper, we propose a new scheme for
visual signal representation that leverages the philosophy of transferable
modality. In particular, the deep learning model, which characterizes and
absorbs the statistics of the input scene with online training, could be
efficiently represented in the sense of rate-utility optimization to serve as
the enhancement layer in the bitstream. As such, the overall performance can be
further guaranteed by optimizing the new modality incorporated. The proposed
framework is implemented on the state-of-the-art video coding standard (i.e.,
versatile video coding), and significantly better representation capability has
been observed based on extensive evaluations.Comment: Accepted in ACM Multimedia 202
Residue guided loop filter for HEVC post processing
The block-based coding structure in the hybrid coding framework gives rise to
the obvious artifacts such as blocking, ringing .etc. Recently, some
Convolutional Neural Network (CNN) based works apply reconstruction as the only
input to reduce the artifacts. Though the performance of these works relying on
powerful learning ability surpasses traditional loop-filter based methods in
the High Efficiency Video Coding (HEVC) standard, how to enhance the high
frequency signal is still not addressed. In addition to reconstruction, we
first propose using the residue as the other input of our CNN-based loop
filter. In essence, the residual signal as a high frequency indicator guides
the CNN to augment the high frequency signal such as sharp shape and edge
information. Second, we find out that the reconstruction and residue signals
have different characteristics and should be handled with different network
structures. For the reconstruction, we develop an All Frequency
(reconstruction) CNN (AF-CNN) adopting the down sampling and up sampling pairs
to learn all frequency signal with the global information. For the residue, we
devise a High Frequency (residual) CNN (HF-CNN) customizing the Residual Blocks
to adapt to the high frequency signal information. To the best of our
knowledge, this is the first work that employs residual signal as a vital
independent high frequency input to direct the learning of CNN- based loop
filtering. We implement the proposed algorithms in the HEVC reference software.
The experimental results show that our proposed approach of dual inputs of
Residual and Reconstruction with HF-CNN and AF-CNN respectively (RRHA) presents
significant BD-rate savings compared with the current CNN-based scheme
MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video
The past few years have witnessed great success in applying deep learning to
enhance the quality of compressed image/video. The existing approaches mainly
focus on enhancing the quality of a single frame, not considering the
similarity between consecutive frames. Since heavy fluctuation exists across
compressed video frames as investigated in this paper, frame similarity can be
utilized for quality enhancement of low-quality frames given their neighboring
high-quality frames. This task is Multi-Frame Quality Enhancement (MFQE).
Accordingly, this paper proposes an MFQE approach for compressed video, as the
first attempt in this direction. In our approach, we firstly develop a
Bidirectional Long Short-Term Memory (BiLSTM) based detector to locate Peak
Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame
Convolutional Neural Network (MF-CNN) is designed to enhance the quality of
compressed video, in which the non-PQF and its nearest two PQFs are the input.
In MF-CNN, motion between the non-PQF and PQFs is compensated by a motion
compensation subnet. Subsequently, a quality enhancement subnet fuses the
non-PQF and compensated PQFs, and then reduces the compression artifacts of the
non-PQF. Also, PQF quality is enhanced in the same way. Finally, experiments
validate the effectiveness and generalization ability of our MFQE approach in
advancing the state-of-the-art quality enhancement of compressed video. The
code is available at https://github.com/RyanXingQL/MFQEv2.0.git.Comment: Accepted to TPAMI in September, 2019. v6 updates: correct units in
Fig. 11; correct author info; delete bio photos. arXiv admin note: text
overlap with arXiv:1803.0468
A Convolutional Neural Network-Based Low Complexity Filter
Convolutional Neural Network (CNN)-based filters have achieved significant
performance in video artifacts reduction. However, the high complexity of
existing methods makes it difficult to be applied in real usage. In this paper,
a CNN-based low complexity filter is proposed. We utilize depth separable
convolution (DSC) merged with the batch normalization (BN) as the backbone of
our proposed CNN-based network. Besides, a weight initialization method is
proposed to enhance the training performance. To solve the well known over
smoothing problem for the inter frames, a frame-level residual mapping (RM) is
presented. We analyze some of the mainstream methods like frame-level and
block-level based filters quantitatively and build our CNN-based filter with
frame-level control to avoid the extra complexity and artificial boundaries
caused by block-level control. In addition, a novel module called RM is
designed to restore the distortion from the learned residuals. As a result, we
can effectively improve the generalization ability of the learning-based filter
and reach an adaptive filtering effect. Moreover, this module is flexible and
can be combined with other learning-based filters. The experimental results
show that our proposed method achieves significant BD-rate reduction than
H.265/HEVC. It achieves about 1.2% BD-rate reduction and 79.1% decrease in
FLOPs than VR-CNN. Finally, the measurement on H.266/VVC and ablation studies
are also conducted to ensure the effectiveness of the proposed method
- …