Search CORE

4 research outputs found

Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding

Author: Blasi Saverio
Mrak Marta
Murn Luka
O'Connor Noel E.
Smeaton Alan F.
Publication venue
Publication date: 11/06/2020
Field of study

Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion compensation. Contrary to previous efforts, the proposed approach focuses on complexity reduction achieved by interpreting the interpolation filters learned by the networks. When the approach is implemented in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for individual sequences is achieved compared with the baseline VVC, while the complexity of learned interpolation is significantly reduced compared to the application of full neural network.Comment: 27th IEEE International Conference on Image Processing, 25-28 Oct 2020, Abu Dhabi, United Arab Emirate

arXiv.org e-Print Archive

Crossref

Irish Universities

DCU Online Research Access Service

Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Author: Blasi Saverio
Mrak Marta
Murn Luka
Smeaton Alan F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.Comment: IEEE Open Journal of Signal Processing Special Issue on Applied AI and Machine Learning for Video Coding and Streaming, June 202

arXiv.org e-Print Archive

Irish Universities

Directory of Open Access Journals

DCU Online Research Access Service

Overfitted neural networks for block-based intra-prediction

Author: Sanchez Silva Victor
Publication venue
Publication date
Field of study

Warwick Research Archives Portal Repository

Intra Picture Prediction for Video Coding with Neural Networks

Author: Helle P.
Marpe D.
Pfaff J.
Rischke R.
Schwarz H.
Schäfer M.
Wiegand T.
Publication venue
Publication date
Field of study

We train a neural network to perform intra picture prediction for block based video coding. Our network has multiple prediction modes which co-adapt during training to minimize a loss function. By applying the l1-norm and a sigmoid-function to the prediction residual in the DCT domain, our loss function reflects properties of the residual quantization and coding stages present in the typical hybrid video coding architecture. We simplify the resulting predictors by pruning them in the frequency domain, thus greatly reducing the number of multiplications otherwise needed for the dense matrix-vector multiplications. Also, by quantizing the network weights and using fixed point arithmetic, we allow for a hardware friendly implementation. We demonstrate significant coding gains over state of the art intra prediction

Crossref

Fraunhofer-ePrints