247 research outputs found
Reducing Complexity of HEVC: A Deep Learning Approach
High Efficiency Video Coding (HEVC) significantly reduces bit-rates over the
proceeding H.264 standard but at the expense of extremely high encoding
complexity. In HEVC, the quad-tree partition of coding unit (CU) consumes a
large proportion of the HEVC encoding complexity, due to the bruteforce search
for rate-distortion optimization (RDO). Therefore, this paper proposes a deep
learning approach to predict the CU partition for reducing the HEVC complexity
at both intra- and inter-modes, which is based on convolutional neural network
(CNN) and long- and short-term memory (LSTM) network. First, we establish a
large-scale database including substantial CU partition data for HEVC intra-
and inter-modes. This enables deep learning on the CU partition. Second, we
represent the CU partition of an entire coding tree unit (CTU) in the form of a
hierarchical CU partition map (HCPM). Then, we propose an early-terminated
hierarchical CNN (ETH-CNN) for learning to predict the HCPM. Consequently, the
encoding complexity of intra-mode HEVC can be drastically reduced by replacing
the brute-force search with ETH-CNN to decide the CU partition. Third, an
early-terminated hierarchical LSTM (ETH-LSTM) is proposed to learn the temporal
correlation of the CU partition. Then, we combine ETH-LSTM and ETH-CNN to
predict the CU partition for reducing the HEVC complexity for inter-mode.
Finally, experimental results show that our approach outperforms other
state-of-the-art approaches in reducing the HEVC complexity at both intra- and
inter-modes.Comment: 17 pages, with 12 figures and 7 table
Deep Learning-Based Video Coding: A Review and A Case Study
The past decade has witnessed great success of deep learning technology in
many disciplines, especially in computer vision and image processing. However,
deep learning-based video coding remains in its infancy. This paper reviews the
representative works about using deep learning for image/video coding, which
has been an actively developing research area since the year of 2015. We divide
the related works into two categories: new coding schemes that are built
primarily upon deep networks (deep schemes), and deep network-based coding
tools (deep tools) that shall be used within traditional coding schemes or
together with traditional coding tools. For deep schemes, pixel probability
modeling and auto-encoder are the two approaches, that can be viewed as
predictive coding scheme and transform coding scheme, respectively. For deep
tools, there have been several proposed techniques using deep learning to
perform intra-picture prediction, inter-picture prediction, cross-channel
prediction, probability distribution prediction, transform, post- or in-loop
filtering, down- and up-sampling, as well as encoding optimizations. In the
hope of advocating the research of deep learning-based video coding, we present
a case study of our developed prototype video codec, namely Deep Learning Video
Coding (DLVC). DLVC features two deep tools that are both based on
convolutional neural network (CNN), namely CNN-based in-loop filter (CNN-ILF)
and CNN-based block adaptive resolution coding (CNN-BARC). Both tools help
improve the compression efficiency by a significant margin. With the two deep
tools as well as other non-deep coding tools, DLVC is able to achieve on
average 39.6\% and 33.0\% bits saving than HEVC, under random-access and
low-delay configurations, respectively. The source code of DLVC has been
released for future researches
Convolutional Neural Networks based Intra Prediction for HEVC
Traditional intra prediction methods for HEVC rely on using the nearest
reference lines for predicting a block, which ignore much richer context
between the current block and its neighboring blocks and therefore cause
inaccurate prediction especially when weak spatial correlation exists between
the current block and the reference lines. To overcome this problem, in this
paper, an intra prediction convolutional neural network (IPCNN) is proposed for
intra prediction, which exploits the rich context of the current block and
therefore is capable of improving the accuracy of predicting the current block.
Meanwhile, the predictions of the three nearest blocks can also be refined. To
the best of our knowledge, this is the first paper that directly applies CNNs
to intra prediction for HEVC. Experimental results validate the effectiveness
of applying CNNs to intra prediction and achieved significant performance
improvement compared to traditional intra prediction methods.Comment: 10 pages, This is the extended edition of poster paper accepted by
DCC 201
Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction
Inter prediction is an important module in video coding for temporal
redundancy removal, where similar reference blocks are searched from previously
coded frames and employed to predict the block to be coded. Although
traditional video codecs can estimate and compensate for block-level motions,
their inter prediction performance is still heavily affected by the remaining
inconsistent pixel-wise displacement caused by irregular rotation and
deformation. In this paper, we address the problem by proposing a deep frame
interpolation network to generate additional reference frames in coding
scenarios. First, we summarize the previous adaptive convolutions used for
frame interpolation and propose a factorized kernel convolutional network to
improve the modeling capacity and simultaneously keep its compact form. Second,
to better train this network, multi-domain hierarchical constraints are
introduced to regularize the training of our factorized kernel convolutional
network. For spatial domain, we use a gradually down-sampled and up-sampled
auto-encoder to generate the factorized kernels for frame interpolation at
different scales. For quality domain, considering the inconsistent quality of
the input frames, the factorized kernel convolution is modulated with
quality-related features to learn to exploit more information from high quality
frames. For frequency domain, a sum of absolute transformed difference loss
that performs frequency transformation is utilized to facilitate network
optimization from the view of coding performance. With the well-designed frame
interpolation network regularized by multi-domain hierarchical constraints, our
method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate
saving for the luma component under the random access configuration
Neural network-based arithmetic coding of intra prediction modes in HEVC
In both H.264 and HEVC, context-adaptive binary arithmetic coding (CABAC) is
adopted as the entropy coding method. CABAC relies on manually designed
binarization processes as well as handcrafted context models, which may
restrict the compression efficiency. In this paper, we propose an arithmetic
coding strategy by training neural networks, and make preliminary studies on
coding of the intra prediction modes in HEVC. Instead of binarization, we
propose to directly estimate the probability distribution of the 35 intra
prediction modes with the adoption of a multi-level arithmetic codec. Instead
of handcrafted context models, we utilize convolutional neural network (CNN) to
perform the probability estimation. Simulation results show that our proposed
arithmetic coding leads to as high as 9.9% bits saving compared with CABAC.Comment: VCIP 201
Near-Lossless Deep Feature Compression for Collaborative Intelligence
Collaborative intelligence is a new paradigm for efficient deployment of deep
neural networks across the mobile-cloud infrastructure. By dividing the network
between the mobile and the cloud, it is possible to distribute the
computational workload such that the overall energy and/or latency of the
system is minimized. However, this necessitates sending deep feature data from
the mobile to the cloud in order to perform inference. In this work, we examine
the differences between the deep feature data and natural image data, and
propose a simple and effective near-lossless deep feature compressor. The
proposed method achieves up to 5% bit rate reduction compared to HEVC-Intra and
even more against other popular image codecs. Finally, we suggest an approach
for reconstructing the input image from compressed deep features in the cloud,
that could serve to supplement the inference performed by the deep model
Can you find a face in a HEVC bitstream?
Finding faces in images is one of the most important tasks in computer
vision, with applications in biometrics, surveillance, human-computer
interaction, and other areas. In our earlier work, we demonstrated that it is
possible to tell whether or not an image contains a face by only examining the
HEVC syntax, without fully reconstructing the image. In the present work we
move further in this direction by showing how to localize faces in HEVC-coded
images, without full reconstruction. We also demonstrate the benefits that such
approach can have in privacy-friendly face localization
Accelerate CU Partition in HEVC using Large-Scale Convolutional Neural Network
High efficiency video coding (HEVC) suffers high encoding computational
complexity, partly attributed to the rate-distortion optimization quad-tree
search in CU partition decision. Therefore, we propose a novel two-stage CU
partition decision approach in HEVC intra-mode. In the proposed approach,
CNN-based algorithm is designed to decide CU partition mode precisely in three
depths. In order to alleviate computational complexity further, an auxiliary
earl-termination mechanism is also proposed to filter obvious homogeneous CUs
out of the subsequent CNN-based algorithm. Experimental results show that the
proposed approach achieves about 37% encoding time saving on average and
insignificant BD-Bitrate rise compared with the original HEVC encoder.Comment: 4 pages, 2 figure
Enhanced Intra Prediction for Video Coding by Using Multiple Neural Networks
This paper enhances the intra prediction by using multiple neural network
modes (NM). Each NM serves as an end-to-end mapping from the neighboring
reference blocks to the current coding block. For the provided NMs, we present
two schemes (appending and substitution) to integrate the NMs with the
traditional modes (TM) defined in high efficiency video coding (HEVC). For the
appending scheme, each NM is corresponding to a certain range of TMs. The
categorization of TMs is based on the expected prediction errors. After
determining the relevant TMs for each NM, we present a probability-aware mode
signaling scheme. The NMs with higher probabilities to be the best mode are
signaled with fewer bits. For the substitution scheme, we propose to replace
the highest and lowest probable TMs. New most probable mode (MPM) generation
method is also employed when substituting the lowest probable TMs. Experimental
results demonstrate that using multiple NMs will improve the coding efficiency
apparently compared with the single NM. Specifically, proposed appending scheme
with seven NMs can save 2.6%, 3.8%, 3.1% BD-rate for Y, U, V components
compared with using single NM in the state-of-the-art works.Comment: Accepted to IEEE Transactions on Multimedi
DeepQTMT: A Deep Learning Approach for Fast QTMT-based CU Partition of Intra-mode VVC
Versatile Video Coding (VVC), as the latest standard, significantly improves
the coding efficiency over its ancestor standard High Efficiency Video Coding
(HEVC), but at the expense of sharply increased complexity. In VVC, the
quad-tree plus multi-type tree (QTMT) structure of coding unit (CU) partition
accounts for over 97% of the encoding time, due to the brute-force search for
recursive rate-distortion (RD) optimization. Instead of the brute-force QTMT
search, this paper proposes a deep learning approach to predict the QTMT-based
CU partition, for drastically accelerating the encoding process of intra-mode
VVC. First, we establish a large-scale database containing sufficient CU
partition patterns with diverse video content, which can facilitate the
data-driven VVC complexity reduction. Next, we propose a multi-stage exit CNN
(MSE-CNN) model with an early-exit mechanism to determine the CU partition, in
accord with the flexible QTMT structure at multiple stages. Then, we design an
adaptive loss function for training the MSE-CNN model, synthesizing both the
uncertain number of split modes and the target on minimized RD cost. Finally, a
multi-threshold decision scheme is developed, achieving desirable trade-off
between complexity and RD performance. Experimental results demonstrate that
our approach can reduce the encoding time of VVC by 44.65%-66.88% with the
negligible Bj{\o}ntegaard delta bit-rate (BD-BR) of 1.322%-3.188%, which
significantly outperforms other state-of-the-art approaches.Comment: 14 pages, 10 figures, 7 tables. Published in IEEE Transactions on
Image Processing (TIP), 202
- …