49 research outputs found
MFQE 2.0: A New Approach for Multi-frame Quality Enhancement on Compressed Video
The past few years have witnessed great success in applying deep learning to
enhance the quality of compressed image/video. The existing approaches mainly
focus on enhancing the quality of a single frame, not considering the
similarity between consecutive frames. Since heavy fluctuation exists across
compressed video frames as investigated in this paper, frame similarity can be
utilized for quality enhancement of low-quality frames given their neighboring
high-quality frames. This task is Multi-Frame Quality Enhancement (MFQE).
Accordingly, this paper proposes an MFQE approach for compressed video, as the
first attempt in this direction. In our approach, we firstly develop a
Bidirectional Long Short-Term Memory (BiLSTM) based detector to locate Peak
Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame
Convolutional Neural Network (MF-CNN) is designed to enhance the quality of
compressed video, in which the non-PQF and its nearest two PQFs are the input.
In MF-CNN, motion between the non-PQF and PQFs is compensated by a motion
compensation subnet. Subsequently, a quality enhancement subnet fuses the
non-PQF and compensated PQFs, and then reduces the compression artifacts of the
non-PQF. Also, PQF quality is enhanced in the same way. Finally, experiments
validate the effectiveness and generalization ability of our MFQE approach in
advancing the state-of-the-art quality enhancement of compressed video. The
code is available at https://github.com/RyanXingQL/MFQEv2.0.git.Comment: Accepted to TPAMI in September, 2019. v6 updates: correct units in
Fig. 11; correct author info; delete bio photos. arXiv admin note: text
overlap with arXiv:1803.0468
Multi-Frame Quality Enhancement for Compressed Video
The past few years have witnessed great success in applying deep learning to
enhance the quality of compressed image/video. The existing approaches mainly
focus on enhancing the quality of a single frame, ignoring the similarity
between consecutive frames. In this paper, we investigate that heavy quality
fluctuation exists across compressed video frames, and thus low quality frames
can be enhanced using the neighboring high quality frames, seen as Multi-Frame
Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach
for compressed video, as a first attempt in this direction. In our approach, we
firstly develop a Support Vector Machine (SVM) based detector to locate Peak
Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame
Convolutional Neural Network (MF-CNN) is designed to enhance the quality of
compressed video, in which the non-PQF and its nearest two PQFs are as the
input. The MF-CNN compensates motion between the non-PQF and PQFs through the
Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement
subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help
of its nearest PQFs. Finally, the experiments validate the effectiveness and
generality of our MFQE approach in advancing the state-of-the-art quality
enhancement of compressed video. The code of our MFQE approach is available at
https://github.com/ryangBUAA/MFQE.gitComment: to appear in CVPR 201
Optimized Pre-Compensating Compression
In imaging systems, following acquisition, an image/video is transmitted or
stored and eventually presented to human observers using different and often
imperfect display devices. While the resulting quality of the output image may
severely be affected by the display, this degradation is usually ignored in the
preceding compression. In this paper we model the sub-optimality of the display
device as a known degradation operator applied on the decompressed image/video.
We assume the use of a standard compression path, and augment it with a
suitable pre-processing procedure, providing a compressed signal intended to
compensate the degradation without any post-filtering. Our approach originates
from an intricate rate-distortion problem, optimizing the modifications to the
input image/video for reaching best end-to-end performance. We address this
seemingly computationally intractable problem using the alternating direction
method of multipliers (ADMM) approach, leading to a procedure in which a
standard compression technique is iteratively applied. We demonstrate the
proposed method for adjusting HEVC image/video compression to compensate
post-decompression visual effects due to a common type of displays.
Particularly, we use our method to reduce motion-blur perceived while viewing
video on LCD devices. The experiments establish our method as a leading
approach for preprocessing high bit-rate compression to counterbalance a
post-decompression degradation
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Deep motion‐compensation enhancement in video compression
This work introduces the multiframe motion-compensation enhancement network (MMCE-Net), a deep-learning tool aimed at improving the performance of current video coding standards based on motion-compensation, such as H.265/HEVC. The proposed method improves the inter-prediction coding efficiency by enhancing the accuracy of the motion-compensated frame and thereby improving the rate-distortion performance. MMCE-Net is a neural network that jointly exploits the predicted coding unit and two co-located coding units from previous reference frames to improve the estimation of the temporal evolution of the scene. This letter describes the architecture of MMCE-Net, how it is integrated into H.265/HEVC and the corresponding performance
Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video
The past few years have witnessed fast development in video quality
enhancement via deep learning. Existing methods mainly focus on enhancing the
objective quality of compressed video while ignoring its perceptual quality. In
this paper, we focus on enhancing the perceptual quality of compressed video.
Our main observation is that enhancing the perceptual quality mostly relies on
recovering high-frequency sub-bands in wavelet domain. Accordingly, we propose
a novel generative adversarial network (GAN) based on multi-level wavelet
packet transform (WPT) to enhance the perceptual quality of compressed video,
which is called multi-level wavelet-based GAN (MW-GAN). In MW-GAN, we first
apply motion compensation with a pyramid architecture to obtain temporal
information. Then, we propose a wavelet reconstruction network with
wavelet-dense residual blocks (WDRB) to recover the high-frequency details. In
addition, the adversarial loss of MW-GAN is added via WPT to further encourage
high-frequency details recovery for video frames. Experimental results
demonstrate the superiority of our method
Non-Local ConvLSTM for Video Compression Artifact Reduction
Video compression artifact reduction aims to recover high-quality videos from
low-quality compressed videos. Most existing approaches use a single
neighboring frame or a pair of neighboring frames (preceding and/or following
the target frame) for this task. Furthermore, as frames of high quality overall
may contain low-quality patches, and high-quality patches may exist in frames
of low quality overall, current methods focusing on nearby peak-quality frames
(PQFs) may miss high-quality details in low-quality frames. To remedy these
shortcomings, in this paper we propose a novel end-to-end deep neural network
called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple
consecutive frames. An approximate non-local strategy is introduced in
NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal
dependency in a video sequence. This approximate strategy makes the non-local
module work in a fast and low space-cost way. Our method uses the preceding and
following frames of the target frame to generate a residual, from which a
higher quality frame is reconstructed. Experiments on two datasets show that
NL-ConvLSTM outperforms the existing methods.Comment: ICCV 201
A Research on Enhancing Reconstructed Frames in Video Codecs
A series of video codecs, combining encoder and decoder, have been developed to improve the human experience of video-on-demand: higher quality videos at lower bitrates. Despite being at the leading of the compression race, the High Efficiency Video Coding (HEVC or H.265), the latest Versatile Video Coding (VVC) standard, and compressive sensing (CS) are still suffering from lossy compression. Lossy compression algorithms approximate input signals by smaller file size but degrade reconstructed data, leaving space for further improvement. This work aims to develop hybrid codecs taking advantage of both state-of-the-art video coding technologies and deep learning techniques: traditional non-learning components will either be replaced or combined with various deep learning models. Note that related studies have not made the most of coding information, this work studies and utilizes more potential resources in both encoder and decoder for further improving different codecs.In the encoder, motion compensated prediction (MCP) is one of the key components that bring high compression ratios to video codecs. For enhancing the MCP performance, modern video codecs offer interpolation filters for fractional motions. However, these handcrafted fractional interpolation filters are designed on ideal signals, which limit the codecs in dealing with real-world video data. This proposal introduces a deep learning approach for all Luma and Chroma fractional pixels, aiming for more accurate motion compensation and coding efficiency.One extraordinary feature of CS compared to other codecs is that CS can recover multiple images at the decoder by applying various algorithms on the one and only coded data. Note that the related works have not made use of this property, this work enables a deep learning-based compressive sensing image enhancement framework using multiple reconstructed signals. Learning to enhance from multiple reconstructed images delivers a valuable mechanism for training deep neural networks while requiring no additional transmitted data.In the encoder and decoder of modern video coding standards, in-loop filters (ILF) dedicate the most important role in producing the final reconstructed image quality and compression rate. This work introduces a deep learning approach for improving the handcrafted ILF for modern video coding standards. We first utilize various coding resources and present novel deep learning-based ILF. Related works perform the rate-distortion-based ILF mode selection at the coding-tree-unit (CTU) level to further enhance the deep learning-based ILF, and the corresponding bits are encoded and transmitted to the decoder. In this work, we move towards a deeper approach: a reinforcement-learning based autonomous ILF mode selection scheme is presented, enabling the ability to adapt to different coding unit (CU) levels. Using this approach, we require no additional bits while ensuring the best image quality at local levels beyond the CTU level.While this research mainly targets improving the recent video coding standard VVC and the sparse-based CS, it is also flexibly designed to adapt the previous and future video coding standards with minor modifications.博士(工学)法政大学 (Hosei University
Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model
The past few years have witnessed increasing interests in applying deep
learning to video compression. However, the existing approaches compress a
video frame with only a few number of reference frames, which limits their
ability to fully exploit the temporal correlation among video frames. To
overcome this shortcoming, this paper proposes a Recurrent Learned Video
Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent
Probability Model (RPM). Specifically, the RAE employs recurrent cells in both
the encoder and decoder. As such, the temporal information in a large range of
frames can be used for generating latent representations and reconstructing
compressed outputs. Furthermore, the proposed RPM network recurrently estimates
the Probability Mass Function (PMF) of the latent representation, conditioned
on the distribution of previous latent representations. Due to the correlation
among consecutive frames, the conditional cross entropy can be lower than the
independent cross entropy, thus reducing the bit-rate. The experiments show
that our approach achieves the state-of-the-art learned video compression
performance in terms of both PSNR and MS-SSIM. Moreover, our approach
outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has
better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting
of x265. The codes are available at https://github.com/RenYang-home/RLVC.git.Comment: Accepted for publication in IEEE Journal of Selected Topics in Signal
Processing (J-STSP