165 research outputs found
Complexity Analysis Of Next-Generation VVC Encoding and Decoding
While the next generation video compression standard, Versatile Video Coding
(VVC), provides a superior compression efficiency, its computational complexity
dramatically increases. This paper thoroughly analyzes this complexity for both
encoder and decoder of VVC Test Model 6, by quantifying the complexity
break-down for each coding tool and measuring the complexity and memory
requirements for VVC encoding/decoding. These extensive analyses are performed
for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD),
Random-Access (RA), and All-Intra (AI) conditions (a total of 320
encoding/decoding). Results indicate that the VVC encoder and decoder are 5x
and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI,
respectively. Detailed analysis of coding tools reveals that in LD on average,
motion estimation tools with 53%, transformation and quantization with 22%, and
entropy coding with 7% dominate the encoding complexity. In decoding, loop
filters with 30%, motion compensation with 20%, and entropy decoding with 16%,
are the most complex modules. Moreover, the required memory bandwidth for VVC
encoding/decoding are measured through memory profiling, which are 30x and 3x
of HEVC. The reported results and insights are a guide for future research and
implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202
Leveraging progressive model and overfitting for efficient learned image compression
Deep learning is overwhelmingly dominant in the field of computer vision and
image/video processing for the last decade. However, for image and video
compression, it lags behind the traditional techniques based on discrete cosine
transform (DCT) and linear filters. Built on top of an autoencoder
architecture, learned image compression (LIC) systems have drawn enormous
attention in recent years. Nevertheless, the proposed LIC systems are still
inferior to the state-of-the-art traditional techniques, for example, the
Versatile Video Coding (VVC/H.266) standard, due to either their compression
performance or decoding complexity. Although claimed to outperform the
VVC/H.266 on a limited bit rate range, some proposed LIC systems take over 40
seconds to decode a 2K image on a GPU system. In this paper, we introduce a
powerful and flexible LIC framework with multi-scale progressive (MSP)
probability model and latent representation overfitting (LOF) technique. With
different predefined profiles, the proposed framework can achieve various
balance points between compression efficiency and computational complexity.
Experiments show that the proposed framework achieves 2.5%, 1.0%, and 1.3%
Bjontegaard delta bit rate (BD-rate) reduction over the VVC/H.266 standard on
three benchmark datasets on a wide bit rate range. More importantly, the
decoding complexity is reduced from O(n) to O(1) compared to many other LIC
systems, resulting in over 20 times speedup when decoding 2K images
Comprehensive Complexity Assessment of Emerging Learned Image Compression on CPU and GPU
Learned Compression (LC) is the emerging technology for compressing image and
video content, using deep neural networks. Despite being new, LC methods have
already gained a compression efficiency comparable to state-of-the-art image
compression, such as HEVC or even VVC. However, the existing solutions often
require a huge computational complexity, which discourages their adoption in
international standards or products. This paper provides a comprehensive
complexity assessment of several notable methods, that shed light on the
matter, and guide the future development of this field by presenting key
findings. To do so, six existing methods have been evaluated for both encoding
and decoding, on CPU and GPU platforms. Various aspects of complexity such as
the overall complexity, share of each coding module, number of operations,
number of parameters, most demanding GPU kernels, and memory requirements have
been measured and compared on Kodak dataset. The reported results (1) quantify
the complexity of LC methods, (2) fairly compare different methods, and (3) a
major contribution of the work is identifying and quantifying the key factors
affecting the complexity
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation
Deep learning-based image compression has made great progresses recently.
However, many leading schemes use serial context-adaptive entropy model to
improve the rate-distortion (R-D) performance, which is very slow. In addition,
the complexities of the encoding and decoding networks are quite high and not
suitable for many practical applications. In this paper, we introduce four
techniques to balance the trade-off between the complexity and performance. We
are the first to introduce deformable convolutional module in compression
framework, which can remove more redundancies in the input image, thereby
enhancing compression performance. Second, we design a checkerboard context
model with two separate distribution parameter estimation networks and
different probability models, which enables parallel decoding without
sacrificing the performance compared to the sequential context-adaptive model.
Third, we develop an improved three-step knowledge distillation and training
scheme to achieve different trade-offs between the complexity and the
performance of the decoder network, which transfers both the final and
intermediate results of the teacher network to the student network to help its
training. Fourth, we introduce regularization to make the numerical
values of the latent representation more sparse. Then we only encode non-zero
channels in the encoding and decoding process, which can greatly reduce the
encoding and decoding time. Experiments show that compared to the
state-of-the-art learned image coding scheme, our method can be about 20 times
faster in encoding and 70-90 times faster in decoding, and our R-D performance
is also higher. Our method outperforms the traditional approach in
H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and
MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.Comment: Submitted to Trans. Journa
Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding
The versatility of recent machine learning approaches makes them ideal for
improvement of next generation video compression solutions. Unfortunately,
these approaches typically bring significant increases in computational
complexity and are difficult to interpret into explainable models, affecting
their potential for implementation within practical video coding applications.
This paper introduces a novel explainable neural network-based inter-prediction
scheme, to improve the interpolation of reference samples needed for fractional
precision motion compensation. The approach requires a single neural network to
be trained from which a full quarter-pixel interpolation filter set is derived,
as the network is easily interpretable due to its linear structure. A novel
training framework enables each network branch to resemble a specific
fractional shift. This practical solution makes it very efficient to use
alongside conventional video coding schemes. When implemented in the context of
the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and
2.25% BD-rate savings can be achieved on average for lower resolution sequences
under the random access, low-delay B and low-delay P configurations,
respectively, while the complexity of the learned interpolation schemes is
significantly reduced compared to the interpolation with full CNNs.Comment: IEEE Open Journal of Signal Processing Special Issue on Applied AI
and Machine Learning for Video Coding and Streaming, June 202
Sequence-Level Reference Frames In Video Coding
The proliferation of low-cost DRAM chipsets now begins to allow for the consideration of substantially-increased decoded picture buffers in advanced video coding standards such as HEVC, VVC, and Google VP9. At the same time, the increasing demand for rapid scene changes and multiple scene repetitions in entertainment or broadcast content indicates that extending the frame referencing interval to tens of minutes or even the entire video sequence may offer coding gains, as long as one is able to identify frame similarity in a computationally- and memory-efficient manner. Motivated by these observations, we propose a “stitching” method that defines a reference buffer and a reference frame selection algorithm. Our proposal extends the referencing interval of inter-frame video coding to the entire length of video sequences. Our reference frame selection algorithm uses well-established feature descriptor methods that describe frame structural elements in a compact and semantically-rich manner. We propose to combine such compact descriptors with a similarity scoring mechanism in order to select the frames to be “stitched” to reference picture buffers of advanced inter-frame encoders like HEVC, VVC, and VP9 without breaking standard compliance. Our evaluation on synthetic and real-world video sequences with the HEVC and VVC reference encoders shows that our method offers significant rate gains, with complexity and memory requirements that remain manageable for practical encoders and decoders
MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding
The rapid advancement of artificial intelligence (AI) technology has led to
the prioritization of standardizing the processing, coding, and transmission of
video using neural networks. To address this priority area, the Moving Picture,
Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a
suite of standards called MPAI-EEV for "end-to-end optimized neural video
coding." The aim of this AI-based video standard project is to compress the
number of bits required to represent high-fidelity video data by utilizing
data-trained neural coding technologies. This approach is not constrained by
how data coding has traditionally been applied in the context of a hybrid
framework. This paper presents an overview of recent and ongoing
standardization efforts in this area and highlights the key technologies and
design philosophy of EEV. It also provides a comparison and report on some
primary efforts such as the coding efficiency of the reference model.
Additionally, it discusses emerging activities such as learned
Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under
development, or in the exploration phase. With a focus on UAV video signals,
this paper addresses the current status of these preliminary efforts. It also
indicates development timelines, summarizes the main technical details, and
provides pointers to further points of reference. The exploration experiment
shows that the EEV model performs better than the state-of-the-art video coding
standard H.266/VVC in terms of perceptual evaluation metric
Overview of the Low Complexity Enhancement Video Coding (LCEVC) Standard
The Low Complexity Enhancement Video Coding (LCEVC) specification is a recent standard approved by the ISO/IEC JTC 1/SC 29/WG04 (MPEG) Video Coding. The main goal of LCEVC is to provide a standalone toolset for the enhancement of any other existing codec. It works on top of other coding schemes, resulting in a multi-layer video coding technology, but unlike existing scalable video codecs, adds enhancement layers completely independent from the base video. The LCEVC technology takes as input the decoded video at lower resolution and adds up to two enhancement sub-layers of residuals encoded with specialized low-complexity coding tools, such as simple temporal prediction, frequency transform, quantization, and entropy encoding. This paper provides an overview of the main features of the LCEVC standard: high compression efficiency, low complexity, minimized requirements of memory and processing power
- …