18 research outputs found
Learning to Inpaint for Image Compression
We study the design of deep architectures for lossy image compression. We
present two architectural recipes in the context of multi-stage progressive
encoders and empirically demonstrate their importance on compression
performance. Specifically, we show that: (a) predicting the original image data
from residuals in a multi-stage progressive architecture facilitates learning
and leads to improved performance at approximating the original content and (b)
learning to inpaint (from neighboring image pixels) before performing
compression reduces the amount of information that must be stored to achieve a
high-quality approximation. Incorporating these design choices in a baseline
progressive encoder yields an average reduction of over in file size
with similar quality compared to the original residual encoder.Comment: Published in Advances in Neural Information Processing Systems (NIPS
2017
Generative Memorize-Then-Recall framework for low bit-rate Surveillance Video Compression
Applications of surveillance video have developed rapidly in recent years to
protect public safety and daily life, which often detect and recognize objects
in video sequences. Traditional coding frameworks remove temporal redundancy in
surveillance video by block-wise motion compensation, lacking the extraction
and utilization of inherent structure information. In this paper, we figure out
this issue by disentangling surveillance video into the structure of a global
spatio-temporal feature (memory) for Group of Picture (GoP) and skeleton for
each frame (clue). The memory is obtained by sequentially feeding frame inside
GoP into a recurrent neural network, describing appearance for objects that
appeared inside GoP. While the skeleton is calculated by a pose estimator, it
is regarded as a clue to recall memory. Furthermore, an attention mechanism is
introduced to obtain the relation between appearance and skeletons. Finally, we
employ generative adversarial network to reconstruct each frame. Experimental
results indicate that our method effectively generates realistic reconstruction
based on appearance and skeleton, which show much higher compression
performance on surveillance video compared with the latest video compression
standard H.265.Comment: 11 pages, 8 figure
Learned Scalable Image Compression with Bidirectional Context Disentanglement Network
In this paper, we propose a learned scalable/progressive image compression
scheme based on deep neural networks (DNN), named Bidirectional Context
Disentanglement Network (BCD-Net). For learning hierarchical representations,
we first adopt bit-plane decomposition to decompose the information coarsely
before the deep-learning-based transformation. However, the information carried
by different bit-planes is not only unequal in entropy but also of different
importance for reconstruction. We thus take the hidden features corresponding
to different bit-planes as the context and design a network topology with
bidirectional flows to disentangle the contextual information for more
effective compressed representations. Our proposed scheme enables us to obtain
the compressed codes with scalable rates via a one-pass encoding-decoding.
Experiment results demonstrate that our proposed model outperforms the
state-of-the-art DNN-based scalable image compression methods in both PSNR and
MS-SSIM metrics. In addition, our proposed model achieves higher performance in
MS-SSIM metric than conventional scalable image codecs. Effectiveness of our
technical components is also verified through sufficient ablation experiments.Comment: IEEE International Conference on Multimedia and Expo (ICME2019
Learning based Facial Image Compression with Semantic Fidelity Metric
Surveillance and security scenarios usually require high efficient facial
image compression scheme for face recognition and identification. While either
traditional general image codecs or special facial image compression schemes
only heuristically refine codec separately according to face verification
accuracy metric. We propose a Learning based Facial Image Compression (LFIC)
framework with a novel Regionally Adaptive Pooling (RAP) module whose
parameters can be automatically optimized according to gradient feedback from
an integrated hybrid semantic fidelity metric, including a successfully
exploration to apply Generative Adversarial Network (GAN) as metric directly in
image compression scheme. The experimental results verify the framework's
efficiency by demonstrating performance improvement of 71.41%, 48.28% and
52.67% bitrate saving separately over JPEG2000, WebP and neural network-based
codecs under the same face verification accuracy distortion metric. We also
evaluate LFIC's superior performance gain compared with latest specific facial
image codecs. Visual experiments also show some interesting insight on how LFIC
can automatically capture the information in critical areas based on semantic
distortion metrics for optimized compression, which is quite different from the
heuristic way of optimization in traditional image compression algorithms.Comment: Accepted by Neurocomputin
BlockCNN: A Deep Network for Artifact Removal and Image Compression
We present a general technique that performs both artifact removal and image
compression. For artifact removal, we input a JPEG image and try to remove its
compression artifacts. For compression, we input an image and process its 8 by
8 blocks in a sequence. For each block, we first try to predict its intensities
based on previous blocks; then, we store a residual with respect to the input
image. Our technique reuses JPEG's legacy compression and decompression
routines. Both our artifact removal and our image compression techniques use
the same deep network, but with different training weights. Our technique is
simple and fast and it significantly improves the performance of artifact
removal and image compression
Joint Autoregressive and Hierarchical Priors for Learned Image Compression
Recent models for learned image compression are based on autoencoders,
learning approximately invertible mappings from pixels to a quantized latent
representation. These are combined with an entropy model, a prior on the latent
representation that can be used with standard arithmetic coding algorithms to
yield a compressed bitstream. Recently, hierarchical entropy models have been
introduced as a way to exploit more structure in the latents than simple fully
factorized priors, improving compression performance while maintaining
end-to-end optimization. Inspired by the success of autoregressive priors in
probabilistic generative models, we examine autoregressive, hierarchical, as
well as combined priors as alternatives, weighing their costs and benefits in
the context of image compression. While it is well known that autoregressive
models come with a significant computational penalty, we find that in terms of
compression performance, autoregressive and hierarchical priors are
complementary and, together, exploit the probabilistic structure in the latents
better than all previous learned models. The combined model yields
state-of-the-art rate--distortion performance, providing a 15.8% average
reduction in file size over the previous state-of-the-art method based on deep
learning, which corresponds to a 59.8% size reduction over JPEG, more than 35%
reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG,
the current state-of-the-art image codec. To the best of our knowledge, our
model is the first learning-based method to outperform BPG on both PSNR and
MS-SSIM distortion metrics.Comment: Accepted at the 32nd Conference on Neural Information Processing
Systems (NIPS 2018
A Unified End-to-End Framework for Efficient Deep Image Compression
Image compression is a widely used technique to reduce the spatial redundancy
in images. Recently, learning based image compression has achieved significant
progress by using the powerful representation ability from neural networks.
However, the current state-of-the-art learning based image compression methods
suffer from the huge computational cost, which limits their capacity for
practical applications. In this paper, we propose a unified framework called
Efficient Deep Image Compression (EDIC) based on three new technologies,
including a channel attention module, a Gaussian mixture model and a
decoder-side enhancement module. Specifically, we design an auto-encoder style
network for learning based image compression. To improve the coding efficiency,
we exploit the channel relationship between latent representations by using the
channel attention module. Besides, the Gaussian mixture model is introduced for
the entropy model and improves the accuracy for bitrate estimation.
Furthermore, we introduce the decoder-side enhancement module to further
improve image compression performance. Our EDIC method can also be readily
incorporated with the Deep Video Compression (DVC) framework to further improve
the video compression performance. Simultaneously, our EDIC method boosts the
coding performance significantly while bringing slightly increased
computational cost. More importantly, experimental results demonstrate that the
proposed approach outperforms the current state-of-the-art image compression
methods and is up to more than 150 times faster in terms of decoding speed when
compared with Minnen's method. The proposed framework also successfully
improves the performance of the recent deep video compression system DVC. Our
code will be released at https://github.com/liujiaheng/compression.Comment: We will released our code and training dat
CocoNet: A deep neural network for mapping pixel coordinates to color values
In this paper, we propose a deep neural network approach for mapping the 2D
pixel coordinates in an image to the corresponding Red-Green-Blue (RGB) color
values. The neural network is termed CocoNet, i.e. coordinates-to-color
network. During the training process, the neural network learns to encode the
input image within its layers. More specifically, the network learns a
continuous function that approximates the discrete RGB values sampled over the
discrete 2D pixel locations. At test time, given a 2D pixel coordinate, the
neural network will output the approximate RGB values of the corresponding
pixel. By considering every 2D pixel location, the network can actually
reconstruct the entire learned image. It is important to note that we have to
train an individual neural network for each input image, i.e. one network
encodes a single image only. To the best of our knowledge, we are the first to
propose a neural approach for encoding images individually, by learning a
mapping from the 2D pixel coordinate space to the RGB color space. Our neural
image encoding approach has various low-level image processing applications
ranging from image encoding, image compression and image denoising to image
resampling and image completion. We conduct experiments that include both
quantitative and qualitative results, demonstrating the utility of our approach
and its superiority over standard baselines, e.g. bilateral filtering or
bicubic interpolation. Our code is available at
https://github.com/paubric/python-fuse-coconet.Comment: Accepted at the International Conference on Neural Information
Processing 201
Efficient in-situ image and video compression through probabilistic image representation
Fast and effective image compression for multi-dimensional images has become
increasingly important for efficient storage and transfer of massive amounts of
high-resolution images and videos. Desirable properties in compression methods
include (1) high reconstruction quality at a wide range of compression rates
while preserving key local details, (2) computational scalability, (3)
applicability to a variety of different image/video types and of different
dimensions, (4) progressive transmission, and (5) ease of tuning. We present
such a method for multi-dimensional image compression called Compression via
Adaptive Recursive Partitioning (CARP). CARP uses an optimal permutation of the
image pixels inferred from a Bayesian probabilistic model on recursive
partitions of the image to reduce its effective dimensionality, achieving a
parsimonious representation that preserves information. CARP uses a multi-layer
Bayesian hierarchical model to achieve in-situ compression along with
self-tuning and regularization, with just one single parameter to be specified
by the user to achieve the desired compression rate. Extensive numerical
experiments using a variety of datasets including 2D still images, real-life
YouTube videos, and surveillance videos show that CARP dominates the
state-of-the-art image/video compression approaches---including JPEG, JPEG2000,
BPG, MPEG4, HEVC and a neural network-based method---for all of these different
image types and on nearly all of the individual images and videos over some
methods.Comment: 20 pages, 11 figure
EEG Channel Interpolation Using Deep Encoder-decoder Netwoks
Electrode "pop" artifacts originate from the spontaneous loss of connectivity
between a surface and an electrode. Electroencephalography (EEG) uses a dense
array of electrodes, hence "popped" segments are among the most pervasive type
of artifact seen during the collection of EEG data. In many cases, the
continuity of EEG data is critical for downstream applications (e.g. brain
machine interface) and requires that popped segments be accurately
interpolated. In this paper we frame the interpolation problem as a
self-learning task using a deep encoder-decoder network. We compare our
approach against contemporary interpolation methods on a publicly available EEG
data set. Our approach exhibited a minimum of ~15% improvement over
contemporary approaches when tested on subjects and tasks not used during model
training. We demonstrate how our model's performance can be enhanced further on
novel subjects and tasks using transfer learning. All code and data associated
with this study is open-source to enable ease of extension and practical use.
To our knowledge, this work is the first solution to the EEG interpolation
problem that uses deep learning