15 research outputs found
DMCNN: Dual-Domain Multi-Scale Convolutional Neural Network for Compression Artifacts Removal
JPEG is one of the most commonly used standards among lossy image compression
methods. However, JPEG compression inevitably introduces various kinds of
artifacts, especially at high compression rates, which could greatly affect the
Quality of Experience (QoE). Recently, convolutional neural network (CNN) based
methods have shown excellent performance for removing the JPEG artifacts. Lots
of efforts have been made to deepen the CNNs and extract deeper features, while
relatively few works pay attention to the receptive field of the network. In
this paper, we illustrate that the quality of output images can be
significantly improved by enlarging the receptive fields in many cases. One
step further, we propose a Dual-domain Multi-scale CNN (DMCNN) to take full
advantage of redundancies on both the pixel and DCT domains. Experiments show
that DMCNN sets a new state-of-the-art for the task of JPEG artifact removal.Comment: To appear in IEEE ICIP 201
Learning Binary Residual Representations for Domain-specific Video Streaming
We study domain-specific video streaming. Specifically, we target a streaming
setting where the videos to be streamed from a server to a client are all in
the same domain and they have to be compressed to a small size for low-latency
transmission. Several popular video streaming services, such as the video game
streaming services of GeForce Now and Twitch, fall in this category. While
conventional video compression standards such as H.264 are commonly used for
this task, we hypothesize that one can leverage the property that the videos
are all in the same domain to achieve better video quality. Based on this
hypothesis, we propose a novel video compression pipeline. Specifically, we
first apply H.264 to compress domain-specific videos. We then train a novel
binary autoencoder to encode the leftover domain-specific residual information
frame-by-frame into binary representations. These binary representations are
then compressed and sent to the client together with the H.264 stream. In our
experiments, we show that our pipeline yields consistent gains over standard
H.264 compression across several benchmark datasets while using the same
channel bandwidth.Comment: Accepted in AAAI'18. Project website at
https://research.nvidia.com/publication/2018-02_Learning-Binary-Residua
: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images
In this paper, we design a Deep Dual-Domain () based fast
restoration model to remove artifacts of JPEG compressed images. It leverages
the large learning capacity of deep networks, as well as the problem-specific
expertise that was hardly incorporated in the past design of deep
architectures. For the latter, we take into consideration both the prior
knowledge of the JPEG compression scheme, and the successful practice of the
sparsity-based dual-domain approach. We further design the One-Step Sparse
Inference (1-SI) module, as an efficient and light-weighted feed-forward
approximation of sparse coding. Extensive experiments verify the superiority of
the proposed model over several state-of-the-art methods. Specifically,
our best model is capable of outperforming the latest deep model for around 1
dB in PSNR, and is 30 times faster
Image Restoration by Estimating Frequency Distribution of Local Patches
In this paper, we propose a method to solve the image restoration problem,
which tries to restore the details of a corrupted image, especially due to the
loss caused by JPEG compression. We have treated an image in the frequency
domain to explicitly restore the frequency components lost during image
compression. In doing so, the distribution in the frequency domain is learned
using the cross entropy loss. Unlike recent approaches, we have reconstructed
the details of an image without using the scheme of adversarial training.
Rather, the image restoration problem is treated as a classification problem to
determine the frequency coefficient for each frequency band in an image patch.
In this paper, we show that the proposed method effectively restores a
JPEG-compressed image with more detailed high frequency components, making the
restored image more vivid.Comment: 9 pages, 5 figures, Accepted as a poster in CVPR 201
MemNet: A Persistent Memory Network for Image Restoration
Recently, very deep convolutional neural networks (CNNs) have been attracting
considerable attention in image restoration. However, as the depth grows, the
long-term dependency problem is rarely realized for these very deep models,
which results in the prior states/layers having little influence on the
subsequent ones. Motivated by the fact that human thoughts have persistency, we
propose a very deep persistent memory network (MemNet) that introduces a memory
block, consisting of a recursive unit and a gate unit, to explicitly mine
persistent memory through an adaptive learning process. The recursive unit
learns multi-level representations of the current state under different
receptive fields. The representations and the outputs from the previous memory
blocks are concatenated and sent to the gate unit, which adaptively controls
how much of the previous states should be reserved, and decides how much of the
current state should be stored. We apply MemNet to three image restoration
tasks, i.e., image denosing, super-resolution and JPEG deblocking.
Comprehensive experiments demonstrate the necessity of the MemNet and its
unanimous superiority on all three tasks over the state of the arts. Code is
available at https://github.com/tyshiwo/MemNet.Comment: Accepted by ICCV 2017 (Spotlight presentation
End-to-End JPEG Decoding and Artifacts Suppression Using Heterogeneous Residual Convolutional Neural Network
Existing deep learning models separate JPEG artifacts suppression from the
decoding protocol as independent task. In this work, we take one step forward
to design a true end-to-end heterogeneous residual convolutional neural network
(HR-CNN) with spectrum decomposition and heterogeneous reconstruction
mechanism. Benefitting from the full CNN architecture and GPU acceleration, the
proposed model considerably improves the reconstruction efficiency. Numerical
experiments show that the overall reconstruction speed reaches to the same
magnitude of the standard CPU JPEG decoding protocol, while both decoding and
artifacts suppression are completed together. We formulate the JPEG artifacts
suppression task as an interactive process of decoding and image detail
reconstructions. A heterogeneous, fully convolutional, mechanism is proposed to
particularly address the uncorrelated nature of different spectral channels.
Directly starting from the JPEG code in k-space, the network first extracts the
spectral samples channel by channel, and restores the spectral snapshots with
expanded throughput. These intermediate snapshots are then heterogeneously
decoded and merged into the pixel space image. A cascaded residual learning
segment is designed to further enhance the image details. Experiments verify
that the model achieves outstanding performance in JPEG artifacts suppression,
while its full convolutional operations and elegant network structure offers
higher computational efficiency for practical online usage compared with other
deep learning models on this topic
Implicit Dual-domain Convolutional Network for Robust Color Image Compression Artifact Reduction
Several dual-domain convolutional neural network-based methods show
outstanding performance in reducing image compression artifacts. However, they
suffer from handling color images because the compression processes for
gray-scale and color images are completely different. Moreover, these methods
train a specific model for each compression quality and require multiple models
to achieve different compression qualities. To address these problems, we
proposed an implicit dual-domain convolutional network (IDCN) with the pixel
position labeling map and the quantization tables as inputs. Specifically, we
proposed an extractor-corrector framework-based dual-domain correction unit
(DCU) as the basic component to formulate the IDCN. A dense block was
introduced to improve the performance of extractor in DRU. The implicit
dual-domain translation allows the IDCN to handle color images with the
discrete cosine transform (DCT)-domain priors. A flexible version of IDCN
(IDCN-f) was developed to handle a wide range of compression qualities.
Experiments for both objective and subjective evaluations on benchmark datasets
show that IDCN is superior to the state-of-the-art methods and IDCN-f exhibits
excellent abilities to handle a wide range of compression qualities with little
performance sacrifice and demonstrates great potential for practical
applications.Comment: accepted by IEEE Transactions on Circuits and Systems for Video
Technology(T-CSVT
Quality Adaptive Low-Rank Based JPEG Decoding with Applications
Small compression noises, despite being transparent to human eyes, can
adversely affect the results of many image restoration processes, if left
unaccounted for. Especially, compression noises are highly detrimental to
inverse operators of high-boosting (sharpening) nature, such as deblurring and
superresolution against a convolution kernel. By incorporating the non-linear
DCT quantization mechanism into the formulation for image restoration, we
propose a new sparsity-based convex programming approach for joint compression
noise removal and image restoration. Experimental results demonstrate
significant performance gains of the new approach over existing image
restoration methods
Non-Local ConvLSTM for Video Compression Artifact Reduction
Video compression artifact reduction aims to recover high-quality videos from
low-quality compressed videos. Most existing approaches use a single
neighboring frame or a pair of neighboring frames (preceding and/or following
the target frame) for this task. Furthermore, as frames of high quality overall
may contain low-quality patches, and high-quality patches may exist in frames
of low quality overall, current methods focusing on nearby peak-quality frames
(PQFs) may miss high-quality details in low-quality frames. To remedy these
shortcomings, in this paper we propose a novel end-to-end deep neural network
called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple
consecutive frames. An approximate non-local strategy is introduced in
NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal
dependency in a video sequence. This approximate strategy makes the non-local
module work in a fast and low space-cost way. Our method uses the preceding and
following frames of the target frame to generate a residual, from which a
higher quality frame is reconstructed. Experiments on two datasets show that
NL-ConvLSTM outperforms the existing methods.Comment: ICCV 201
Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images
Given the prevalence of JPEG compressed images, optimizing image
reconstruction from the compressed format remains an important problem. Instead
of simply reconstructing a pixel block from the centers of indexed DCT
coefficient quantization bins (hard decoding), soft decoding reconstructs a
block by selecting appropriate coefficient values within the indexed bins with
the help of signal priors. The challenge thus lies in how to define suitable
priors and apply them effectively.
In this paper, we combine three image priors---Laplacian prior for DCT
coefficients, sparsity prior and graph-signal smoothness prior for image
patches---to construct an efficient JPEG soft decoding algorithm. Specifically,
we first use the Laplacian prior to compute a minimum mean square error (MMSE)
initial solution for each code block. Next, we show that while the sparsity
prior can reduce block artifacts, limiting the size of the over-complete
dictionary (to lower computation) would lead to poor recovery of high DCT
frequencies. To alleviate this problem, we design a new graph-signal smoothness
prior (desired signal has mainly low graph frequencies) based on the left
eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to
previous graph-signal smoothness priors, LERaG has desirable image filtering
properties with low computation overhead. We demonstrate how LERaG can
facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal
via an interpretation of low graph frequency components as relaxed solutions to
normalized cut in spectral clustering. Finally, we construct a soft decoding
algorithm using the three signal priors with appropriate prior weights.
Experimental results show that our proposal outperforms state-of-the-art soft
decoding algorithms in both objective and subjective evaluations noticeably