25 research outputs found
GRACE: Loss-Resilient Real-Time Video through Neural Codecs
In real-time video communication, retransmitting lost packets over
high-latency networks is not viable due to strict latency requirements. To
counter packet losses without retransmission, two primary strategies are
employed -- encoder-based forward error correction (FEC) and decoder-based
error concealment. The former encodes data with redundancy before transmission,
yet determining the optimal redundancy level in advance proves challenging. The
latter reconstructs video from partially received frames, but dividing a frame
into independently coded partitions inherently compromises compression
efficiency, and the lost information cannot be effectively recovered by the
decoder without adapting the encoder.
We present a loss-resilient real-time video system called GRACE, which
preserves the user's quality of experience (QoE) across a wide range of packet
losses through a new neural video codec. Central to GRACE's enhanced loss
resilience is its joint training of the neural encoder and decoder under a
spectrum of simulated packet losses. In lossless scenarios, GRACE achieves
video quality on par with conventional codecs (e.g., H.265). As the loss rate
escalates, GRACE exhibits a more graceful, less pronounced decline in quality,
consistently outperforming other loss-resilient schemes. Through extensive
evaluation on various videos and real network traces, we demonstrate that GRACE
reduces undecodable frames by 95% and stall duration by 90% compared with FEC,
while markedly boosting video quality over error concealment methods. In a user
study with 240 crowdsourced participants and 960 subjective ratings, GRACE
registers a 38% higher mean opinion score (MOS) than other baselines
Neural Video Recovery for Cloud Gaming
Cloud gaming is a multi-billion dollar industry. A client in cloud gaming
sends its movement to the game server on the Internet, which renders and
transmits the resulting video back. In order to provide a good gaming
experience, a latency below 80 ms is required. This means that video rendering,
encoding, transmission, decoding, and display have to finish within that time
frame, which is especially challenging to achieve due to server overload,
network congestion, and losses. In this paper, we propose a new method for
recovering lost or corrupted video frames in cloud gaming. Unlike traditional
video frame recovery, our approach uses game states to significantly enhance
recovery accuracy and utilizes partially decoded frames to recover lost
portions. We develop a holistic system that consists of (i) efficiently
extracting game states, (ii) modifying H.264 video decoder to generate a mask
to indicate which portions of video frames need recovery, and (iii) designing a
novel neural network to recover either complete or partial video frames. Our
approach is extensively evaluated using iPhone 12 and laptop implementations,
and we demonstrate the utility of game states in the game video recovery and
the effectiveness of our overall design
Real-Time Neural Video Recovery and Enhancement on Mobile Devices
As mobile devices become increasingly popular for video streaming, it's
crucial to optimize the streaming experience for these devices. Although deep
learning-based video enhancement techniques are gaining attention, most of them
cannot support real-time enhancement on mobile devices. Additionally, many of
these techniques are focused solely on super-resolution and cannot handle
partial or complete loss or corruption of video frames, which is common on the
Internet and wireless networks.
To overcome these challenges, we present a novel approach in this paper. Our
approach consists of (i) a novel video frame recovery scheme, (ii) a new
super-resolution algorithm, and (iii) a receiver enhancement-aware video bit
rate adaptation algorithm. We have implemented our approach on an iPhone 12,
and it can support 30 frames per second (FPS). We have evaluated our approach
in various networks such as WiFi, 3G, 4G, and 5G networks. Our evaluation shows
that our approach enables real-time enhancement and results in a significant
increase in video QoE (Quality of Experience) of 24\% - 82\% in our video
streaming system
Mathematical Approaches for Image Enhancement Problems
This thesis develops novel techniques that can solve some image enhancement problems using theoretically and technically proven and very useful mathematical tools to image processing such as wavelet transforms, partial differential equations, and variational models. Three subtopics are mainly covered. First, color image denoising framework is introduced to achieve high quality denoising results by considering correlations between color components while existing denoising approaches can be plugged in flexibly. Second, a new and efficient framework for image contrast and color enhancement in the compressed wavelet domain is proposed. The proposed approach is capable of enhancing both global and local contrast and brightness as well as preserving color consistency. The framework does not require inverse transform for image enhancement since linear scale factors are directly applied to both scaling and wavelet coefficients in the compressed domain, which results in high computational efficiency. Also contaminated noise in the image can be efficiently reduced by introducing wavelet shrinkage terms adaptively in different scales. The proposed method is able to enhance a wavelet-coded image computationally efficiently with high image quality and less noise or other artifact. The experimental results show that the proposed method produces encouraging results both visually and numerically compared to some existing approaches. Finally, image inpainting problem is discussed. Literature review, psychological analysis, and challenges on image inpainting problem and related topics are described. An inpainting algorithm using energy minimization and texture mapping is proposed. Mumford-Shah energy minimization model detects and preserves edges in the inpainting domain by detecting both the main structure and the detailed edges. This approach utilizes faster hierarchical level set method and guarantees convergence independent of initial conditions. The estimated segmentation results in the inpainting domain are stored in segmentation map, which is referred by a texture mapping algorithm for filling textured regions. We also propose an inpainting algorithm using wavelet transform that can expect better global structure estimation of the unknown region in addition to shape and texture properties since wavelet transforms have been used for various image analysis problems due to its nice multi-resolution properties and decoupling characteristics
Image and Video Forensics
Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity