67 research outputs found
Error resilience and concealment techniques for high-efficiency video coding
This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods
Reference picture selection using checkerboard pattern for resilient video coding
The improved compression efficiency achieved by
the High Efficiency Video Coding (HEVC) standard has the
counter-effect of decreasing error resilience in transmission over
error-prone channels. To increase the error resilience of HEVC
streams, this paper proposes a checkerboard reference picture
selection method in order to reduce the prediction mismatch at
the decoder in case of frame losses. The proposed approach not
only allows to reduce the error propagation at the decoder, but
also enhances the quality of reconstructed frames by selectively
constraining the choice of reference pictures used for temporal
prediction. The underlying approach is to increase the amount of
accurate temporal information at the decoder when transmission
errors occur, to improve the video quality by using an efficient
combination of diverse motion fields. The proposed method
compensates for the small loss of coding efficiency at frame loss
rates as low as 3%. For a single frame-loss event the proposed
method can achieve up to 2 dB of gain in the affected frames
and an average quality gain of 0:84 dB for different error prone
conditions
Light field image coding with jointly estimated self-similarity bi-prediction
This paper proposes an efficient light field image coding (LFC) solution based on High Efficiency Video Coding (HEVC) and a novel Bi-prediction Self-Similarity (Bi-SS) estimation and compensation approach to efficiently explore the inherent non-local spatial correlation of this type of content, where two predictor blocks are jointly estimated from the same search window by using a locally optimal rate constrained algorithm. Moreover, a theoretical analysis of the proposed Bi-SS prediction is also presented, which shows that other non-local spatial prediction schemes proposed in literature are suboptimal in terms of Rate-Distortion (RD) performance and, for this reason, can be considered as restricted cases of the jointly estimated Bi-SS solution proposed here. These theoretical insights are shown to be consistent with the presented experimental results, and demonstrate that the proposed LFC scheme is able to outperform the benchmark solutions with significant gains with respect to HEVC (with up to 61.1% of bit savings) and other state-of-the-art LFC solutions in the literature (with up 16.9% of bit savings).info:eu-repo/semantics/acceptedVersio
Speeding up VP9 Intra Encoder with Hierarchical Deep Learning Based Partition Prediction
In VP9 video codec, the sizes of blocks are decided during encoding by
recursively partitioning 6464 superblocks using rate-distortion
optimization (RDO). This process is computationally intensive because of the
combinatorial search space of possible partitions of a superblock. Here, we
propose a deep learning based alternative framework to predict the intra-mode
superblock partitions in the form of a four-level partition tree, using a
hierarchical fully convolutional network (H-FCN). We created a large database
of VP9 superblocks and the corresponding partitions to train an H-FCN model,
which was subsequently integrated with the VP9 encoder to reduce the intra-mode
encoding time. The experimental results establish that our approach speeds up
intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in
the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in
speed levels which are designed to provide faster encoding at the expense of
decreased rate-distortion performance, we find that our model is able to
outperform the fastest recommended speed level of the reference VP9 encoder for
the good quality intra encoding configuration, in terms of both speedup and
BD-rate
The impact of Tiles on video coding performance: a case study on HEVC and AV1 video coding standards
Improvement of Decision on Coding Unit Split Mode and Intra-Picture Prediction by Machine Learning
High efficiency Video Coding (HEVC) has been deemed as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The reference software (i.e., HM) have included the implementations of the guidelines in appliance with the new standard. The software includes both encoder and decoder functionality.
Machine learning (ML) works with data and processes it to discover patterns that can be later used to analyze new trends. ML can play a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems.
In this research project, in compliance with H.265 standard, we are focused on improvement of the performance of encode/decode by optimizing the partition of prediction block in coding unit with the help of supervised machine learning. We used Keras library as the main tool to implement the experiments. Key parameters were tuned for the model in our convolution neuron network. The coding tree unit mode decision time produced in the model was compared with that produced in HM software, and it was proved to have improved significantly. The intra-picture prediction mode decision was also investigated with modified model and yielded satisfactory results
Efficient algorithms for scalable video coding
A scalable video bitstream specifically designed for the needs of various client terminals,
network conditions, and user demands is much desired in current and future video transmission
and storage systems. The scalable extension of the H.264/AVC standard (SVC) has
been developed to satisfy the new challenges posed by heterogeneous environments, as
it permits a single video stream to be decoded fully or partially with variable quality, resolution,
and frame rate in order to adapt to a specific application. This thesis presents
novel improved algorithms for SVC, including: 1) a fast inter-frame and inter-layer coding
mode selection algorithm based on motion activity; 2) a hierarchical fast mode selection
algorithm; 3) a two-part Rate Distortion (RD) model targeting the properties of different
prediction modes for the SVC rate control scheme; and 4) an optimised Mean Absolute
Difference (MAD) prediction model.
The proposed fast inter-frame and inter-layer mode selection algorithm is based on the
empirical observation that a macroblock (MB) with slow movement is more likely to be
best matched by one in the same resolution layer. However, for a macroblock with fast
movement, motion estimation between layers is required. Simulation results show that
the algorithm can reduce the encoding time by up to 40%, with negligible degradation in
RD performance.
The proposed hierarchical fast mode selection scheme comprises four levels and makes
full use of inter-layer, temporal and spatial correlation aswell as the texture information of
each macroblock. Overall, the new technique demonstrates the same coding performance
in terms of picture quality and compression ratio as that of the SVC standard, yet produces
a saving in encoding time of up to 84%. Compared with state-of-the-art SVC fast mode
selection algorithms, the proposed algorithm achieves a superior computational time reduction
under very similar RD performance conditions.
The existing SVC rate distortion model cannot accurately represent the RD properties of
the prediction modes, because it is influenced by the use of inter-layer prediction. A separate
RD model for inter-layer prediction coding in the enhancement layer(s) is therefore
introduced. Overall, the proposed algorithms improve the average PSNR by up to 0.34dB
or produce an average saving in bit rate of up to 7.78%. Furthermore, the control accuracy
is maintained to within 0.07% on average.
As aMADprediction error always exists and cannot be avoided, an optimisedMADprediction
model for the spatial enhancement layers is proposed that considers the MAD from
previous temporal frames and previous spatial frames together, to achieve a more accurateMADprediction.
Simulation results indicate that the proposedMADprediction model
reduces the MAD prediction error by up to 79% compared with the JVT-W043 implementation
Dense light field coding: a survey
Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems.
Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio
- …