20,654 research outputs found
RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling Based Video Coding
Integrating deep learning techniques into the video coding framework gains
significant improvement compared to the standard compression techniques,
especially applying super-resolution (up-sampling) to down-sampling based video
coding as post-processing. However, besides up-sampling degradation, the
various artifacts brought from compression make super-resolution problem more
difficult to solve. The straightforward solution is to integrate the artifact
removal techniques before super-resolution. However, some helpful features may
be removed together, degrading the super-resolution performance. To address
this problem, we proposed an end-to-end restoration-reconstruction deep neural
network (RR-DnCNN) using the degradation-aware technique, which entirely solves
degradation from compression and sub-sampling. Besides, we proved that the
compression degradation produced by Random Access configuration is rich enough
to cover other degradation types, such as Low Delay P and All Intra, for
training. Since the straightforward network RR-DnCNN with many layers as a
chain has poor learning capability suffering from the gradient vanishing
problem, we redesign the network architecture to let reconstruction leverages
the captured features from restoration using up-sampling skip connections. Our
novel architecture is called restoration-reconstruction u-shaped deep neural
network (RR-DnCNN v2.0). As a result, our RR-DnCNN v2.0 outperforms the
previous works and can attain 17.02% BD-rate reduction on UHD resolution for
all-intra anchored by the standard H.265/HEVC. The source code is available at
https://minhmanho.github.io/rrdncnn/.Comment: Published in TIP (Open Access). Check our source code at
https://minhmanho.github.io/rrdncnn
A Framework for Super-Resolution of Scalable Video via Sparse Reconstruction of Residual Frames
This paper introduces a framework for super-resolution of scalable video
based on compressive sensing and sparse representation of residual frames in
reconnaissance and surveillance applications. We exploit efficient compressive
sampling and sparse reconstruction algorithms to super-resolve the video
sequence with respect to different compression rates. We use the sparsity of
residual information in residual frames as the key point in devising our
framework. Moreover, a controlling factor as the compressibility threshold to
control the complexity-performance trade-off is defined. Numerical experiments
confirm the efficiency of the proposed framework in terms of the compression
rate as well as the quality of reconstructed video sequence in terms of PSNR
measure. The framework leads to a more efficient compression rate and higher
video quality compared to other state-of-the-art algorithms considering
performance-complexity trade-offs.Comment: IEEE Military Communications Conference, MILCOM, 201
A Group Variational Transformation Neural Network for Fractional Interpolation of Video Coding
Motion compensation is an important technology in video coding to remove the
temporal redundancy between coded video frames. In motion compensation,
fractional interpolation is used to obtain more reference blocks at sub-pixel
level. Existing video coding standards commonly use fixed interpolation filters
for fractional interpolation, which are not efficient enough to handle diverse
video signals well. In this paper, we design a group variational transformation
convolutional neural network (GVTCNN) to improve the fractional interpolation
performance of the luma component in motion compensation. GVTCNN infers samples
at different sub-pixel positions from the input integer-position sample. It
first extracts a shared feature map from the integer-position sample to infer
various sub-pixel position samples. Then a group variational transformation
technique is used to transform a group of copied shared feature maps to samples
at different sub-pixel positions. Experimental results have identified the
interpolation efficiency of our GVTCNN. Compared with the interpolation method
of High Efficiency Video Coding, our method achieves 1.9% bit saving on average
and up to 5.6% bit saving under low-delay P configuration.Comment: DCC 201
Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach
Emotion recognition from facial expressions is tremendously useful,
especially when coupled with smart devices and wireless multimedia
applications. However, the inadequate network bandwidth often limits the
spatial resolution of the transmitted video, which will heavily degrade the
recognition reliability. We develop a novel framework to achieve robust emotion
recognition from low bit rate video. While video frames are downsampled at the
encoder side, the decoder is embedded with a deep network model for joint
super-resolution (SR) and recognition. Notably, we propose a novel max-mix
training strategy, leading to a single "One-for-All" model that is remarkably
robust to a vast range of downsampling factors. That makes our framework well
adapted for the varied bandwidths in real transmission scenarios, without
hampering scalability or efficiency. The proposed framework is evaluated on the
AVEC 2016 benchmark, and demonstrates significantly improved stand-alone
recognition performance, as well as rate-distortion (R-D) performance, than
either directly recognizing from LR frames, or separating SR and recognition.Comment: Accepted by the Seventh International Conference on Affective
Computing and Intelligent Interaction (ACII2017
A novel super resolution reconstruction of low reoslution images progressively using dct and zonal filter based denoising
Due to the factors like processing power limitations and channel capabilities
images are often down sampled and transmitted at low bit rates resulting in a
low resolution compressed image. High resolution images can be reconstructed
from several blurred, noisy and down sampled low resolution images using a
computational process know as super resolution reconstruction. Super-resolution
is the process of combining multiple aliased low-quality images to produce a
high resolution, high-quality image. The problem of recovering a high
resolution image progressively from a sequence of low resolution compressed
images is considered. In this paper we propose a novel DCT based progressive
image display algorithm by stressing on the encoding and decoding process. At
the encoder we consider a set of low resolution images which are corrupted by
additive white Gaussian noise and motion blur. The low resolution images are
compressed using 8 by 8 blocks DCT and noise is filtered using our proposed
novel zonal filter. Multiframe fusion is performed in order to obtain a single
noise free image. At the decoder the image is reconstructed progressively by
transmitting the coarser image first followed by the detail image. And finally
a super resolution image is reconstructed by applying our proposed novel
adaptive interpolation technique. We have performed both objective and
subjective analysis of the reconstructed image, and the resultant image has
better super resolution factor, and a higher ISNR and PSNR. A comparative study
done with Iterative Back Projection (IBP) and Projection on to Convex Sets
(POCS),Papoulis Grechberg, FFT based Super resolution Reconstruction shows that
our method has out performed the previous contributions.Comment: 20 pages, 11 figure
Kernel based low-rank sparse model for single image super-resolution
Self-similarity learning has been recognized as a promising method for single
image super-resolution (SR) to produce high-resolution (HR) image in recent
years. The performance of learning based SR reconstruction, however, highly
depends on learned representation coeffcients. Due to the degradation of input
image, conventional sparse coding is prone to produce unfaithful representation
coeffcients. To this end, we propose a novel kernel based low-rank sparse model
with self-similarity learning for single image SR which incorporates
nonlocalsimilarity prior to enforce similar patches having similar
representation weights. We perform a gradual magnification scheme, using
self-examples extracted from the degraded input image and up-scaled versions.
To exploit nonlocal-similarity, we concatenate the vectorized input patch and
its nonlocal neighbors at different locations into a data matrix which consists
of similar components. Then we map the nonlocal data matrix into a
high-dimensional feature space by kernel method to capture their nonlinear
structures. Under the assumption that the sparse coeffcients for the nonlocal
data in the kernel space should be low-rank, we impose low-rank constraint on
sparse coding to share similarities among representation coeffcients and remove
outliers in order that stable weights for SR reconstruction can be obtained.
Experimental results demonstrate the advantage of our proposed method in both
visual quality and reconstruction error.Comment: 27 pages, Keywords: low-rank, sparse representation, kernel method,
self-similarity learning, super-resolutio
Sparse Coding Approach for Multi-Frame Image Super Resolution
An image super-resolution method from multiple observation of low-resolution
images is proposed. The method is based on sub-pixel accuracy block matching
for estimating relative displacements of observed images, and sparse signal
representation for estimating the corresponding high-resolution image. Relative
displacements of small patches of observed low-resolution images are accurately
estimated by a computationally efficient block matching method. Since the
estimated displacements are also regarded as a warping component of image
degradation process, the matching results are directly utilized to generate
low-resolution dictionary for sparse image representation. The matching scores
of the block matching are used to select a subset of low-resolution patches for
reconstructing a high-resolution patch, that is, an adaptive selection of
informative low-resolution images is realized. When there is only one
low-resolution image, the proposed method works as a single-frame
super-resolution method. The proposed method is shown to perform comparable or
superior to conventional single- and multi-frame super-resolution methods
through experiments using various real-world datasets
Improving Low Bit-Rate Video Coding using Spatio-Temporal Down-Scaling
Good quality video coding for low bit-rate applications is important for
transmission over narrow-bandwidth channels and for storage with limited memory
capacity. In this work, we develop a previous analysis for image compression at
low bit-rates to adapt it to video signals. Improving compression using
down-scaling in the spatial and temporal dimensions is examined. We show, both
theoretically and experimentally, that at low bit-rates, we benefit from
applying spatio-temporal scaling. The proposed method includes down-scaling
before the compression and a corresponding up-scaling afterwards, while the
codec itself is left unmodified. We propose analytic models for low bit-rate
compression and spatio-temporal scaling operations. Specifically, we use
theoretic models of motion-compensated prediction of available and absent
frames as in coding and frame-rate up-conversion (FRUC) applications,
respectively. The proposed models are designed for multi-resolution analysis.
In addition, we formulate a bit-allocation procedure and propose a method for
estimating good down-scaling factors of a given video based on its second-order
statistics and the given bit-budget. We validate our model with experimental
results of H.264 compression
Stereo on a budget
We propose an algorithm for recovering depth using less than two images.
Instead of having both cameras send their entire image to the host computer,
the left camera sends its image to the host while the right camera sends only a
fraction of its image. The key aspect is that the cameras send the
information without communicating at all. Hence, the required communication
bandwidth is significantly reduced.
While standard image compression techniques can reduce the communication
bandwidth, this requires additional computational resources on the part of the
encoder (camera). We aim at designing a light weight encoder that only touches
a fraction of the pixels. The burden of decoding is placed on the decoder
(host).
We show that it is enough for the encoder to transmit a sparse set of pixels.
Using only images, with as little as 2% of the image,
the decoder can compute a depth map. The depth map's accuracy is comparable to
traditional stereo matching algorithms that require both images as input. Using
the depth map and the left image, the right image can be synthesized. No
computations are required at the encoder, and the decoder's runtime is linear
in the images' size.Comment: update flowchart in Fig.
Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction
Inter prediction is an important module in video coding for temporal
redundancy removal, where similar reference blocks are searched from previously
coded frames and employed to predict the block to be coded. Although
traditional video codecs can estimate and compensate for block-level motions,
their inter prediction performance is still heavily affected by the remaining
inconsistent pixel-wise displacement caused by irregular rotation and
deformation. In this paper, we address the problem by proposing a deep frame
interpolation network to generate additional reference frames in coding
scenarios. First, we summarize the previous adaptive convolutions used for
frame interpolation and propose a factorized kernel convolutional network to
improve the modeling capacity and simultaneously keep its compact form. Second,
to better train this network, multi-domain hierarchical constraints are
introduced to regularize the training of our factorized kernel convolutional
network. For spatial domain, we use a gradually down-sampled and up-sampled
auto-encoder to generate the factorized kernels for frame interpolation at
different scales. For quality domain, considering the inconsistent quality of
the input frames, the factorized kernel convolution is modulated with
quality-related features to learn to exploit more information from high quality
frames. For frequency domain, a sum of absolute transformed difference loss
that performs frequency transformation is utilized to facilitate network
optimization from the view of coding performance. With the well-designed frame
interpolation network regularized by multi-domain hierarchical constraints, our
method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate
saving for the luma component under the random access configuration
- …