63 research outputs found
Motion Scalability for Video Coding with Flexible Spatio-Temporal Decompositions
PhDThe research presented in this thesis aims to extend the scalability range of the
wavelet-based video coding systems in order to achieve fully scalable coding with a
wide range of available decoding points. Since the temporal redundancy regularly
comprises the main portion of the global video sequence redundancy, the techniques
that can be generally termed motion decorrelation techniques have a central role in
the overall compression performance. For this reason the scalable motion modelling
and coding are of utmost importance, and specifically, in this thesis possible
solutions are identified and analysed.
The main contributions of the presented research are grouped into two
interrelated and complementary topics. Firstly a flexible motion model with rateoptimised
estimation technique is introduced. The proposed motion model is based
on tree structures and allows high adaptability needed for layered motion coding. The
flexible structure for motion compensation allows for optimisation at different stages
of the adaptive spatio-temporal decomposition, which is crucial for scalable coding
that targets decoding on different resolutions. By utilising an adaptive choice of
wavelet filterbank, the model enables high compression based on efficient mode
selection. Secondly, solutions for scalable motion modelling and coding are
developed. These solutions are based on precision limiting of motion vectors and
creation of a layered motion structure that describes hierarchically coded motion.
The solution based on precision limiting relies on layered bit-plane coding of motion
vector values. The second solution builds on recently established techniques that
impose scalability on a motion structure. The new approach is based on two major
improvements: the evaluation of distortion in temporal Subbands and motion search
in temporal subbands that finds the optimal motion vectors for layered motion
structure.
Exhaustive tests on the rate-distortion performance in demanding scalable video
coding scenarios show benefits of application of both developed flexible motion
model and various solutions for scalable motion coding
Multi-loop quality scalability based on high efficiency video coding
Scalable video coding performance largely depends on the underlying single layer coding efficiency. In this paper, the quality scalability capabilities are evaluated on a base of the new High Efficiency Video Coding (HEVC) standard under development. To enable the evaluation, a multi-loop codec has been designed using HEVC. Adaptive inter-layer prediction is realized by including the lower layer in the reference list of the enhancement layer. As a result, adaptive scalability on frame level and on prediction unit level is accomplished. Compared to single layer coding, 19.4% Bjontegaard Delta bitrate increase is measured over approximately a 30dB to 40dB PSNR range. When compared to simulcast, 20.6% bitrate reduction can be achieved. Under equivalent conditions, the presented technique achieves 43.8% bitrate reduction over Coarse Grain Scalability of the SVC - H.264/AVC-based standard
Adaptive Quantisation in HEVC for Contouring Artefacts Removal in UHD Content
Contouring artefacts affect the visual experience of some particular types of compressed Ultra High Definition (UHD) sequences characterised by smoothly textured areas and gradual transitions in the value of the pixels. This paper proposes a technique to adjust the quantisation process at the encoder so that contouring artefacts are avoided. The devised method does not require any change at the decoder side and introduces a negligible coding rate increment (up to 3.4% for the same objective quality). This result compares favourably with the average 11.2% bit-rate penalty introduced by a method where the quantisation step is reduced in contour-prone areas
Efficient Convolution and Transformer-Based Network for Video Frame Interpolation
Video frame interpolation is an increasingly important research task with
several key industrial applications in the video coding, broadcast and
production sectors. Recently, transformers have been introduced to the field
resulting in substantial performance gains. However, this comes at a cost of
greatly increased memory usage, training and inference time. In this paper, a
novel method integrating a transformer encoder and convolutional features is
proposed. This network reduces the memory burden by close to 50% and runs up to
four times faster during inference time compared to existing transformer-based
interpolation methods. A dual-encoder architecture is introduced which combines
the strength of convolutions in modelling local correlations with those of the
transformer for long-range dependencies. Quantitative evaluations are conducted
on various benchmarks with complex motion to showcase the robustness of the
proposed method, achieving competitive performance compared to state-of-the-art
interpolation networks.Comment: Paper accepted in IEEE ICIP 2023: International Conference on Image
Processing 202
Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding
The versatility of recent machine learning approaches makes them ideal for
improvement of next generation video compression solutions. Unfortunately,
these approaches typically bring significant increases in computational
complexity and are difficult to interpret into explainable models, affecting
their potential for implementation within practical video coding applications.
This paper introduces a novel explainable neural network-based inter-prediction
scheme, to improve the interpolation of reference samples needed for fractional
precision motion compensation. The approach requires a single neural network to
be trained from which a full quarter-pixel interpolation filter set is derived,
as the network is easily interpretable due to its linear structure. A novel
training framework enables each network branch to resemble a specific
fractional shift. This practical solution makes it very efficient to use
alongside conventional video coding schemes. When implemented in the context of
the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and
2.25% BD-rate savings can be achieved on average for lower resolution sequences
under the random access, low-delay B and low-delay P configurations,
respectively, while the complexity of the learned interpolation schemes is
significantly reduced compared to the interpolation with full CNNs.Comment: IEEE Open Journal of Signal Processing Special Issue on Applied AI
and Machine Learning for Video Coding and Streaming, June 202
- …