135 research outputs found
Image and Video Coding Techniques for Ultra-low Latency
The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe
Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders
The next-generation Versatile Video Coding (VVC) standard introduces a new
Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree
(BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions.
This new approach leads to five possible splits at each block depth and thereby
improves the coding efficiency of VVC over that of the preceding High
Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT)
partitioning with a single split per block depth. However, MTT also has brought
a considerable impact on encoder computational complexity. In this paper, a
two-stage learning-based technique is proposed to tackle the complexity
overhead of MTT in VVC intra encoders. In our scheme, the input block is first
processed by a Convolutional Neural Network (CNN) to predict its spatial
features through a vector of probabilities describing the partition at each 4x4
edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial
features to predict the most likely splits at each block. Finally, based on
this prediction, only the N most likely splits are processed by the
Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT
models on a wide range of image contents, we also propose a public VVC frame
partitioning dataset based on existing image dataset encoded with the VVC
reference software encoder. Our proposal relying on the top-3 configuration
reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%.
A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57%
bitrate loss. These results emphasis a better trade-off between VTM intra
coding efficiency and complexity reduction compared to the state-of-the-art
solutions
DCT and DST Filtering with Sparse Graph Operators
Graph filtering is a fundamental tool in graph signal processing. Polynomial
graph filters (PGFs), defined as polynomials of a fundamental graph operator,
can be implemented in the vertex domain, and usually have a lower complexity
than frequency domain filter implementations. In this paper, we focus on the
design of filters for graphs with graph Fourier transform (GFT) corresponding
to a discrete trigonometric transform (DTT), i.e., one of 8 types of discrete
cosine transforms (DCT) and 8 discrete sine transforms (DST). In this case, we
show that multiple sparse graph operators can be identified, which allows us to
propose a generalization of PGF design: multivariate polynomial graph filter
(MPGF). First, for the widely used DCT-II (type-2 DCT), we characterize a set
of sparse graph operators that share the DCT-II matrix as their common
eigenvector matrix. This set contains the well-known connected line graph.
These sparse operators can be viewed as graph filters operating in the DCT
domain, which allows us to approximate any DCT graph filter by a MPGF, leading
to a design with more degrees of freedom than the conventional PGF approach.
Then, we extend those results to all of the 16 DTTs as well as their 2D
versions, and show how their associated sets of multiple graph operators can be
determined. We demonstrate experimentally that ideal low-pass and exponential
DCT/DST filters can be approximated with higher accuracy with similar runtime
complexity. Finally, we apply our method to transform-type selection in a video
codec, AV1, where we demonstrate significant encoding time savings, with a
negligible compression loss.Comment: 16 pages, 11 figures, 5 table
MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding
The rapid advancement of artificial intelligence (AI) technology has led to
the prioritization of standardizing the processing, coding, and transmission of
video using neural networks. To address this priority area, the Moving Picture,
Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a
suite of standards called MPAI-EEV for "end-to-end optimized neural video
coding." The aim of this AI-based video standard project is to compress the
number of bits required to represent high-fidelity video data by utilizing
data-trained neural coding technologies. This approach is not constrained by
how data coding has traditionally been applied in the context of a hybrid
framework. This paper presents an overview of recent and ongoing
standardization efforts in this area and highlights the key technologies and
design philosophy of EEV. It also provides a comparison and report on some
primary efforts such as the coding efficiency of the reference model.
Additionally, it discusses emerging activities such as learned
Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under
development, or in the exploration phase. With a focus on UAV video signals,
this paper addresses the current status of these preliminary efforts. It also
indicates development timelines, summarizes the main technical details, and
provides pointers to further points of reference. The exploration experiment
shows that the EEV model performs better than the state-of-the-art video coding
standard H.266/VVC in terms of perceptual evaluation metric
CTU Depth Decision Algorithms for HEVC: A Survey
High-Efficiency Video Coding (HEVC) surpasses its predecessors in encoding efficiency by introducing new coding tools at the cost of an increased encoding time-complexity. The Coding Tree Unit (CTU) is the main building block used in HEVC. In the HEVC standard, frames are divided into CTUs with the predetermined size of up to 64x64 pixels. Each CTU is then divided recursively into a number of equally sized square areas, known as Coding Units (CUs). Although this diversity of frame partitioning increases encoding efficiency, it also causes an increase in the time complexity due to the increased number of ways to find the optimal partitioning. To address this complexity, numerous algorithms have been proposed to eliminate unnecessary searches during partitioning CTUs by exploiting the correlation in the video. In this paper, existing CTU depth decision algorithms for HEVC are surveyed. These algorithms are categorized into two groups, namely statistics and machine learning approaches. Statistics approaches are further subdivided into neighboring and inherent approaches. Neighboring approaches exploit the similarity between adjacent CTUs to limit the depth range of the current CTU, while inherent approaches use only the available information within the current CTU. Machine learning approaches try to extract and exploit similarities implicitly. Traditional methods like support vector machines or random forests use manually selected features, while recently proposed deep learning methods extract features during training. Finally, this paper discusses extending these methods to more recent video coding formats such as Versatile Video Coding (VVC) and AOMedia Video 1(AV1)
Towards Hybrid-Optimization Video Coding
Video coding is a mathematical optimization problem of rate and distortion
essentially. To solve this complex optimization problem, two popular video
coding frameworks have been developed: block-based hybrid video coding and
end-to-end learned video coding. If we rethink video coding from the
perspective of optimization, we find that the existing two frameworks represent
two directions of optimization solutions. Block-based hybrid coding represents
the discrete optimization solution because those irrelevant coding modes are
discrete in mathematics. It searches for the best one among multiple starting
points (i.e. modes). However, the search is not efficient enough. On the other
hand, end-to-end learned coding represents the continuous optimization solution
because the gradient descent is based on a continuous function. It optimizes a
group of model parameters efficiently by the numerical algorithm. However,
limited by only one starting point, it is easy to fall into the local optimum.
To better solve the optimization problem, we propose to regard video coding as
a hybrid of the discrete and continuous optimization problem, and use both
search and numerical algorithm to solve it. Our idea is to provide multiple
discrete starting points in the global space and optimize the local optimum
around each point by numerical algorithm efficiently. Finally, we search for
the global optimum among those local optimums. Guided by the hybrid
optimization idea, we design a hybrid optimization video coding framework,
which is built on continuous deep networks entirely and also contains some
discrete modes. We conduct a comprehensive set of experiments. Compared to the
continuous optimization framework, our method outperforms pure learned video
coding methods. Meanwhile, compared to the discrete optimization framework, our
method achieves comparable performance to HEVC reference software HM16.10 in
PSNR
- …