2 research outputs found
On the use of deep learning and parallelism techniques to signifcantly reduce the HEVC intra‑coding time
It is well-known that each new video coding standard signifcantly increases in computational complexity with respect to previous standards, and this is particularly true
for the HEVC and VVC video coding standards. The development of techniques for
reducing the required complexity without afecting the rate/distortion (R/D) performance is therefore always a topic of intense research interest. In this paper, we
propose a combination of two powerful techniques, deep learning and parallel computing, to signifcantly reduce the complexity of the HEVC encoding engine. Our
experimental results show that a combination of deep learning to reduce the CTU
partitioning complexity with parallel strategies based on frame partitioning is able
to achieve speedups of up to 26Ă— when 16 threads are used. The R/D penalty in
terms of the BD-BR metric depends on the video content, the compression rate and
the number of OpenMP threads, and was consistently between 0.35 and 10% for the
video sequence test set used in our experiment
A Multi-Threaded Full-feature HEVC Encoder Based on Wavefront Parallel Processing
The High Efficiency Video Coding (HEVC) standard was finalized in early 2013. It provides a far better coding efficiency than any preceding standard but it also bears a significantly higher complexity. In order to cope with the high processing demands, the standard includes several parallelization schemes, that make multi-core encoding and decoding possible. However, the effective realization of these methods is up to the respective codec developers. We propose a multi-threaded encoder implementation, based on HEVC’s reference test model HM11, that makes full use of the Wavefront Parallel Processing (WPP) mechanism and runs on regular consumer hardware. Furthermore, our software produces identical output bitstreams as HM11 and supports all of its features that are allowable in combination with WPP. Experimental results show that our prototype is up to 5.5 times faster than HM11 running on a machine with 6 physical processing cores