2,176 research outputs found
Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders
The next-generation Versatile Video Coding (VVC) standard introduces a new
Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree
(BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions.
This new approach leads to five possible splits at each block depth and thereby
improves the coding efficiency of VVC over that of the preceding High
Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT)
partitioning with a single split per block depth. However, MTT also has brought
a considerable impact on encoder computational complexity. In this paper, a
two-stage learning-based technique is proposed to tackle the complexity
overhead of MTT in VVC intra encoders. In our scheme, the input block is first
processed by a Convolutional Neural Network (CNN) to predict its spatial
features through a vector of probabilities describing the partition at each 4x4
edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial
features to predict the most likely splits at each block. Finally, based on
this prediction, only the N most likely splits are processed by the
Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT
models on a wide range of image contents, we also propose a public VVC frame
partitioning dataset based on existing image dataset encoded with the VVC
reference software encoder. Our proposal relying on the top-3 configuration
reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%.
A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57%
bitrate loss. These results emphasis a better trade-off between VTM intra
coding efficiency and complexity reduction compared to the state-of-the-art
solutions
A Spatial-Temporal Dual-Mode Mixed Flow Network for Panoramic Video Salient Object Detection
Salient object detection (SOD) in panoramic video is still in the initial
exploration stage. The indirect application of 2D video SOD method to the
detection of salient objects in panoramic video has many unmet challenges, such
as low detection accuracy, high model complexity, and poor generalization
performance. To overcome these hurdles, we design an Inter-Layer Attention
(ILA) module, an Inter-Layer weight (ILW) module, and a Bi-Modal Attention
(BMA) module. Based on these modules, we propose a Spatial-Temporal Dual-Mode
Mixed Flow Network (STDMMF-Net) that exploits the spatial flow of panoramic
video and the corresponding optical flow for SOD. First, the ILA module
calculates the attention between adjacent level features of consecutive frames
of panoramic video to improve the accuracy of extracting salient object
features from the spatial flow. Then, the ILW module quantifies the salient
object information contained in the features of each level to improve the
fusion efficiency of the features of each level in the mixed flow. Finally, the
BMA module improves the detection accuracy of STDMMF-Net. A large number of
subjective and objective experimental results testify that the proposed method
demonstrates better detection accuracy than the state-of-the-art (SOTA)
methods. Moreover, the comprehensive performance of the proposed method is
better in terms of memory required for model inference, testing time,
complexity, and generalization performance
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Recommended from our members
Employing Information and Communications Technologies in Homes and Cities for the Health and Well-Being of Older People
YesHe X and Sheriff RE (Eds.) Employing ICT in Homes and Cities for the Health and Well-Being of Older People. Workshop Proceedings of ICT4HOP’16. 15-17 Aug 2016. Sichuan University, Chengdu, China.British Council, Researcher Links, Newton Fund, NSF
Image and Video Coding Techniques for Ultra-low Latency
The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe
Prioritizing Content of Interest in Multimedia Data Compression
Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph
- …