839 research outputs found
Fusion-Based Versatile Video Coding Intra Prediction Algorithm with Template Matching and Linear Prediction
The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and improve their performance. Hence, to further enhance the intra compression performance of VVC, we propose a fusion-based intra prediction algorithm in this paper. Specifically, to better predict areas with similar texture information, we propose a fusion-based adaptive template matching method, which directly takes the error between reference and objective templates into account. Furthermore, to better utilize the correlation between reference pixels and the pixels to be predicted, we propose a fusion-based linear prediction method, which can compensate for the deficiency of single linear prediction. We implemented our algorithm on top of the VVC Test Model (VTM) 9.1. When compared with the VVC, our proposed fusion-based algorithm saves a bitrate of 0.89%, 0.84%, and 0.90% on average for the Y, Cb, and Cr components, respectively. In addition, when compared with some other existing works, our algorithm showed superior performance in bitrate savings
Designs and Implementations in Neural Network-based Video Coding
The past decade has witnessed the huge success of deep learning in well-known
artificial intelligence applications such as face recognition, autonomous
driving, and large language model like ChatGPT. Recently, the application of
deep learning has been extended to a much wider range, with neural
network-based video coding being one of them. Neural network-based video coding
can be performed at two different levels: embedding neural network-based
(NN-based) coding tools into a classical video compression framework or
building the entire compression framework upon neural networks. This paper
elaborates some of the recent exploration efforts of JVET (Joint Video Experts
Team of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29) in the name of neural
network-based video coding (NNVC), falling in the former category.
Specifically, this paper discusses two major NN-based video coding
technologies, i.e. neural network-based intra prediction and neural
network-based in-loop filtering, which have been investigated for several
meeting cycles in JVET and finally adopted into the reference software of NNVC.
Extensive experiments on top of the NNVC have been conducted to evaluate the
effectiveness of the proposed techniques. Compared with VTM-11.0_nnvc, the
proposed NN-based coding tools in NNVC-4.0 could achieve {11.94%, 21.86%,
22.59%}, {9.18%, 19.76%, 20.92%}, and {10.63%, 21.56%, 23.02%} BD-rate
reductions on average for {Y, Cb, Cr} under random-access, low-delay, and
all-intra configurations respectively
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
On the Effectiveness of Video Recolouring as an Uplink-model Video Coding Technique
For decades, conventional video compression formats have advanced via incremental improvements with
each subsequent standard achieving better rate-distortion (RD) efficiency at the cost of increased encoder
complexity compared to its predecessors. Design efforts have been driven by common multi-media use cases
such as video-on-demand, teleconferencing, and video streaming, where the most important requirements are
low bandwidth and low video playback latency. Meeting these requirements involves the use of computa-
tionally expensive block-matching algorithms which produce excellent compression rates and quick decoding
times.
However, emerging use cases such as Wireless Video Sensor Networks, remote surveillance, and mobile
video present new technical challenges in video compression. In these scenarios, the video capture and
encoding devices are often power-constrained and have limited computational resources available, while the
decoder devices have abundant resources and access to a dedicated power source. To address these use cases,
codecs must be power-aware and offer a reasonable trade-off between video quality, bitrate, and encoder
complexity. Balancing these constraints requires a complete rethinking of video compression technology.
The uplink video-coding model represents a new paradigm to address these low-power use cases, providing
the ability to redistribute computational complexity by offloading the motion estimation and compensation
steps from encoder to decoder. Distributed Video Coding (DVC) follows this uplink model of video codec
design, and maintains high quality video reconstruction through innovative channel coding techniques. The
field of DVC is still early in its development, with many open problems waiting to be solved, and no defined
video compression or distribution standards. Due to the experimental nature of the field, most DVC codec
to date have focused on encoding and decoding the Luma plane only, which produce grayscale reconstructed
videos.
In this thesis, a technique called “video recolouring” is examined as an alternative to DVC. Video recolour-
ing exploits the temporal redundancies between colour planes, reducing video bitrate by removing Chroma
information from specific frames and then recolouring them at the decoder.
A novel video recolouring algorithm called Motion-Compensated Recolouring (MCR) is proposed, which
uses block motion estimation and bi-directional weighted motion-compensation to reconstruct Chroma planes
at the decoder. MCR is used to enhance a conventional base-layer codec, and shown to reduce bitrate by
up to 16% with only a slight decrease in objective quality. MCR also outperforms other video recolouring
algorithms in terms of objective video quality, demonstrating up to 2 dB PSNR improvement in some cases
- …