112 research outputs found
A Convolutional Neural Network Approach for Half-Pel Interpolation in Video Coding
Motion compensation is a fundamental technology in video coding to remove the
temporal redundancy between video frames. To further improve the coding
efficiency, sub-pel motion compensation has been utilized, which requires
interpolation of fractional samples. The video coding standards usually adopt
fixed interpolation filters that are derived from the signal processing theory.
However, as video signal is not stationary, the fixed interpolation filters may
turn out less efficient. Inspired by the great success of convolutional neural
network (CNN) in computer vision, we propose to design a CNN-based
interpolation filter (CNNIF) for video coding. Different from previous studies,
one difficulty for training CNNIF is the lack of ground-truth since the
fractional samples are actually not available. Our solution for this problem is
to derive the "ground-truth" of fractional samples by smoothing high-resolution
images, which is verified to be effective by the conducted experiments.
Compared to the fixed half-pel interpolation filter for luma in High Efficiency
Video Coding (HEVC), our proposed CNNIF achieves up to 3.2% and on average 0.9%
BD-rate reduction under low-delay P configuration.Comment: International Symposium on Circuits and Systems (ISCAS) 201
Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding
Deep learning has shown great potential in image and video compression tasks.
However, it brings bit savings at the cost of significant increases in coding
complexity, which limits its potential for implementation within practical
applications. In this paper, a novel neural network-based tool is presented
which improves the interpolation of reference samples needed for fractional
precision motion compensation. Contrary to previous efforts, the proposed
approach focuses on complexity reduction achieved by interpreting the
interpolation filters learned by the networks. When the approach is implemented
in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for
individual sequences is achieved compared with the baseline VVC, while the
complexity of learned interpolation is significantly reduced compared to the
application of full neural network.Comment: 27th IEEE International Conference on Image Processing, 25-28 Oct
2020, Abu Dhabi, United Arab Emirate
Interpreting CNN for low complexity learned sub-pixel motion compensation in video coding
Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion compensation. Contrary to previous efforts, the proposed approach focuses on complexity reduction achieved by interpreting the interpolation filters learned by the networks. When the approach is implemented in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for individual sequences is achieved compared with the baseline VVC, while the complexity of learned interpolation is significantly reduced compared to the application of full neural network
Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding
The versatility of recent machine learning approaches makes them ideal for
improvement of next generation video compression solutions. Unfortunately,
these approaches typically bring significant increases in computational
complexity and are difficult to interpret into explainable models, affecting
their potential for implementation within practical video coding applications.
This paper introduces a novel explainable neural network-based inter-prediction
scheme, to improve the interpolation of reference samples needed for fractional
precision motion compensation. The approach requires a single neural network to
be trained from which a full quarter-pixel interpolation filter set is derived,
as the network is easily interpretable due to its linear structure. A novel
training framework enables each network branch to resemble a specific
fractional shift. This practical solution makes it very efficient to use
alongside conventional video coding schemes. When implemented in the context of
the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and
2.25% BD-rate savings can be achieved on average for lower resolution sequences
under the random access, low-delay B and low-delay P configurations,
respectively, while the complexity of the learned interpolation schemes is
significantly reduced compared to the interpolation with full CNNs.Comment: IEEE Open Journal of Signal Processing Special Issue on Applied AI
and Machine Learning for Video Coding and Streaming, June 202
PEA265: Perceptual Assessment of Video Compression Artifacts
The most widely used video encoders share a common hybrid coding framework
that includes block-based motion estimation/compensation and block-based
transform coding. Despite their high coding efficiency, the encoded videos
often exhibit visually annoying artifacts, denoted as Perceivable Encoding
Artifacts (PEAs), which significantly degrade the visual Qualityof- Experience
(QoE) of end users. To monitor and improve visual QoE, it is crucial to develop
subjective and objective measures that can identify and quantify various types
of PEAs. In this work, we make the first attempt to build a large-scale
subjectlabelled database composed of H.265/HEVC compressed videos containing
various PEAs. The database, namely the PEA265 database, includes 4 types of
spatial PEAs (i.e. blurring, blocking, ringing and color bleeding) and 2 types
of temporal PEAs (i.e. flickering and floating). Each containing at least
60,000 image or video patches with positive and negative labels. To objectively
identify these PEAs, we train Convolutional Neural Networks (CNNs) using the
PEA265 database. It appears that state-of-theart ResNeXt is capable of
identifying each type of PEAs with high accuracy. Furthermore, we define PEA
pattern and PEA intensity measures to quantify PEA levels of compressed video
sequence. We believe that the PEA265 database and our findings will benefit the
future development of video quality assessment methods and perceptually
motivated video encoders.Comment: 10 pages,15 figures,4 table
- …