47 research outputs found
Algorithms and Hardware Co-Design of HEVC Intra Encoders
Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction
Towards Hybrid-Optimization Video Coding
Video coding is a mathematical optimization problem of rate and distortion
essentially. To solve this complex optimization problem, two popular video
coding frameworks have been developed: block-based hybrid video coding and
end-to-end learned video coding. If we rethink video coding from the
perspective of optimization, we find that the existing two frameworks represent
two directions of optimization solutions. Block-based hybrid coding represents
the discrete optimization solution because those irrelevant coding modes are
discrete in mathematics. It searches for the best one among multiple starting
points (i.e. modes). However, the search is not efficient enough. On the other
hand, end-to-end learned coding represents the continuous optimization solution
because the gradient descent is based on a continuous function. It optimizes a
group of model parameters efficiently by the numerical algorithm. However,
limited by only one starting point, it is easy to fall into the local optimum.
To better solve the optimization problem, we propose to regard video coding as
a hybrid of the discrete and continuous optimization problem, and use both
search and numerical algorithm to solve it. Our idea is to provide multiple
discrete starting points in the global space and optimize the local optimum
around each point by numerical algorithm efficiently. Finally, we search for
the global optimum among those local optimums. Guided by the hybrid
optimization idea, we design a hybrid optimization video coding framework,
which is built on continuous deep networks entirely and also contains some
discrete modes. We conduct a comprehensive set of experiments. Compared to the
continuous optimization framework, our method outperforms pure learned video
coding methods. Meanwhile, compared to the discrete optimization framework, our
method achieves comparable performance to HEVC reference software HM16.10 in
PSNR
Learned-based Intra Coding Tools for Video Compression.
PhD Theses.The increase in demand for video rendering in 4K and beyond displays, as well
as immersive video formats, requires the use of e cient compression techniques. In
this thesis novel methods for enhancing the e ciency of current and next generation
video codecs are investigated. Several aspects that in
uence the way conventional video
coding methods work are considered. The methods proposed in this thesis utilise Neural
Networks (NNs) trained for regression tasks in order to predict data. In particular,
Convolutional Neural Networks (CNNs) are used to predict Rate-Distortion (RD) data
for intra-coded frames. Moreover, a novel intra-prediction methods are proposed with
the aim of providing new ways to exploit redundancies overlooked by traditional intraprediction
tools. Additionally, it is shown how such methods can be simpli ed in order
to derive less resource-demanding tools
Designs and Implementations in Neural Network-based Video Coding
The past decade has witnessed the huge success of deep learning in well-known
artificial intelligence applications such as face recognition, autonomous
driving, and large language model like ChatGPT. Recently, the application of
deep learning has been extended to a much wider range, with neural
network-based video coding being one of them. Neural network-based video coding
can be performed at two different levels: embedding neural network-based
(NN-based) coding tools into a classical video compression framework or
building the entire compression framework upon neural networks. This paper
elaborates some of the recent exploration efforts of JVET (Joint Video Experts
Team of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29) in the name of neural
network-based video coding (NNVC), falling in the former category.
Specifically, this paper discusses two major NN-based video coding
technologies, i.e. neural network-based intra prediction and neural
network-based in-loop filtering, which have been investigated for several
meeting cycles in JVET and finally adopted into the reference software of NNVC.
Extensive experiments on top of the NNVC have been conducted to evaluate the
effectiveness of the proposed techniques. Compared with VTM-11.0_nnvc, the
proposed NN-based coding tools in NNVC-4.0 could achieve {11.94%, 21.86%,
22.59%}, {9.18%, 19.76%, 20.92%}, and {10.63%, 21.56%, 23.02%} BD-rate
reductions on average for {Y, Cb, Cr} under random-access, low-delay, and
all-intra configurations respectively