22,782 research outputs found
Pyramid Attention Networks for Image Restoration
Self-similarity refers to the image prior widely used in image restoration
algorithms that small but similar patterns tend to occur at different locations
and scales. However, recent advanced deep convolutional neural network based
methods for image restoration do not take full advantage of self-similarities
by relying on self-attention neural modules that only process information at
the same scale. To solve this problem, we present a novel Pyramid Attention
module for image restoration, which captures long-range feature correspondences
from a multi-scale feature pyramid. Inspired by the fact that corruptions, such
as noise or compression artifacts, drop drastically at coarser image scales,
our attention module is designed to be able to borrow clean signals from their
"clean" correspondences at the coarser levels. The proposed pyramid attention
module is a generic building block that can be flexibly integrated into various
neural architectures. Its effectiveness is validated through extensive
experiments on multiple image restoration tasks: image denoising, demosaicing,
compression artifact reduction, and super resolution. Without any bells and
whistles, our PANet (pyramid attention module with simple network backbones)
can produce state-of-the-art results with superior accuracy and visual quality.
Our code will be available at
https://github.com/SHI-Labs/Pyramid-Attention-Network
SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification
Video person re-identification attracts much attention in recent years. It
aims to match image sequences of pedestrians from different camera views.
Previous approaches usually improve this task from three aspects, including a)
selecting more discriminative frames, b) generating more informative temporal
representations, and c) developing more effective distance metrics. To address
the above issues, we present a novel and practical deep architecture for video
person re-identification termed Self-and-Collaborative Attention Network
(SCAN). It has several appealing properties. First, SCAN adopts non-parametric
attention mechanism to refine the intra-sequence and inter-sequence feature
representation of videos, and outputs self-and-collaborative feature
representation for each video, making the discriminative frames aligned between
the probe and gallery sequences.Second, beyond existing models, a generalized
pairwise similarity measurement is proposed to calculate the similarity feature
representations of video pairs, enabling computing the matching scores by the
binary classifier. Third, a dense clip segmentation strategy is also introduced
to generate rich probe-gallery pairs to optimize the model. Extensive
experiments demonstrate the effectiveness of SCAN, which outperforms the
best-performing baselines on iLIDS-VID, PRID2011 and MARS dataset,
respectively.Comment: 10 pages, 5 figure
Generic 3D Convolutional Fusion for image restoration
Also recently, exciting strides forward have been made in the area of image
restoration, particularly for image denoising and single image
super-resolution. Deep learning techniques contributed to this significantly.
The top methods differ in their formulations and assumptions, so even if their
average performance may be similar, some work better on certain image types and
image regions than others. This complementarity motivated us to propose a novel
3D convolutional fusion (3DCF) method. Unlike other methods adapted to
different tasks, our method uses the exact same convolutional network
architecture to address both image denois- ing and single image
super-resolution. As a result, our 3DCF method achieves substantial
improvements (0.1dB-0.4dB PSNR) over the state-of-the-art methods that it
fuses, and this on standard benchmarks for both tasks. At the same time, the
method still is computationally efficient
Ensemble Super-Resolution with A Reference Dataset
By developing sophisticated image priors or designing deep(er) architectures,
a variety of image Super-Resolution (SR) approaches have been proposed recently
and achieved very promising performance. A natural question that arises is
whether these methods can be reformulated into a unifying framework and whether
this framework assists in SR reconstruction? In this paper, we present a simple
but effective single image SR method based on ensemble learning, which can
produce a better performance than that could be obtained from any of SR methods
to be ensembled (or called component super-resolvers). Based on the assumption
that better component super-resolver should have larger ensemble weight when
performing SR reconstruction, we present a Maximum A Posteriori (MAP)
estimation framework for the inference of optimal ensemble weights. Specially,
we introduce a reference dataset, which is composed of High-Resolution (HR) and
Low-Resolution (LR) image pairs, to measure the super-resolution abilities
(prior knowledge) of different component super-resolvers. To obtain the optimal
ensemble weights, we propose to incorporate the reconstruction constraint,
which states that the degenerated HR image should be equal to the LR
observation one, as well as the prior knowledge of ensemble weights into the
MAP estimation framework. Moreover, the proposed optimization problem can be
solved by an analytical solution. We study the performance of the proposed
method by comparing with different competitive approaches, including four
state-of-the-art non-deep learning based methods, four latest deep learning
based methods and one ensemble learning based method, and prove its
effectiveness and superiority on three public datasets.Comment: 14 pages, 11 figure
Unified Single-Image and Video Super-Resolution via Denoising Algorithms
Single Image Super-Resolution (SISR) aims to recover a high-resolution image
from a given low-resolution version of it. Video Super Resolution (VSR) targets
series of given images, aiming to fuse them to create a higher resolution
outcome. Although SISR and VSR seem to have a lot in common, most SISR
algorithms do not have a simple and direct extension to VSR. VSR is considered
a more challenging inverse problem, mainly due to its reliance on a sub-pixel
accurate motion-estimation, which has no parallel in SISR. Another complication
is the dynamics of the video, often addressed by simply generating a single
frame instead of a complete output sequence.
In this work we suggest a simple and robust super-resolution framework that
can be applied to single images and easily extended to video. Our work relies
on the observation that denoising of images and videos is well-managed and very
effectively treated by a variety of methods. We exploit the Plug-and-Play-Prior
framework and the Regularization-by-Denoising (RED) approach that extends it,
and show how to use such denoisers in order to handle the SISR and the VSR
problems using a unified formulation and framework. This way, we benefit from
the effectiveness and efficiency of existing image/video denoising algorithms,
while solving much more challenging problems. More specifically, harnessing the
VBM3D video denoiser, we obtain a strongly competitive motion-estimation free
VSR algorithm, showing tendency to a high-quality output and fast processing
Block-Matching Convolutional Neural Network for Image Denoising
There are two main streams in up-to-date image denoising algorithms:
non-local self similarity (NSS) prior based methods and convolutional neural
network (CNN) based methods. The NSS based methods are favorable on images with
regular and repetitive patterns while the CNN based methods perform better on
irregular structures. In this paper, we propose a block-matching convolutional
neural network (BMCNN) method that combines NSS prior and CNN. Initially,
similar local patches in the input image are integrated into a 3D block. In
order to prevent the noise from messing up the block matching, we first apply
an existing denoising algorithm on the noisy image. The denoised image is
employed as a pilot signal for the block matching, and then denoising function
for the block is learned by a CNN structure. Experimental results show that the
proposed BMCNN algorithm achieves state-of-the-art performance. In detail,
BMCNN can restore both repetitive and irregular structures.Comment: 11 pages, 9 figure
Chaining Identity Mapping Modules for Image Denoising
We propose to learn a fully-convolutional network model that consists of a
Chain of Identity Mapping Modules (CIMM) for image denoising. The CIMM
structure possesses two distinctive features that are important for the noise
removal task. Firstly, each residual unit employs identity mappings as the skip
connections and receives pre-activated input in order to preserve the gradient
magnitude propagated in both the forward and backward directions. Secondly, by
utilizing dilated kernels for the convolution layers in the residual branch, in
other words within an identity mapping module, each neuron in the last
convolution layer can observe the full receptive field of the first layer.
After being trained on the BSD400 dataset, the proposed network produces
remarkably higher numerical accuracy and better visual image quality than the
state-of-the-art when being evaluated on conventional benchmark images and the
BSD68 dataset
Weighted Low-rank Tensor Recovery for Hyperspectral Image Restoration
Hyperspectral imaging, providing abundant spatial and spectral information
simultaneously, has attracted a lot of interest in recent years. Unfortunately,
due to the hardware limitations, the hyperspectral image (HSI) is vulnerable to
various degradations, such noises (random noise, HSI denoising), blurs
(Gaussian and uniform blur, HSI deblurring), and down-sampled (both spectral
and spatial downsample, HSI super-resolution). Previous HSI restoration methods
are designed for one specific task only. Besides, most of them start from the
1-D vector or 2-D matrix models and cannot fully exploit the structurally
spectral-spatial correlation in 3-D HSI. To overcome these limitations, in this
work, we propose a unified low-rank tensor recovery model for comprehensive HSI
restoration tasks, in which non-local similarity between spectral-spatial cubic
and spectral correlation are simultaneously captured by 3-order tensors.
Further, to improve the capability and flexibility, we formulate it as a
weighted low-rank tensor recovery (WLRTR) model by treating the singular values
differently, and study its analytical solution. We also consider the exclusive
stripe noise in HSI as the gross error by extending WLRTR to robust principal
component analysis (WLRTR-RPCA). Extensive experiments demonstrate the proposed
WLRTR models consistently outperform state-of-the-arts in typical low level
vision HSI tasks, including denoising, destriping, deblurring and
super-resolution.Comment: 22 pages, 22 figure
Real Image Denoising with Feature Attention
Deep convolutional neural networks perform better on images containing
spatially invariant noise (synthetic noise); however, their performance is
limited on real-noisy photographs and requires multiple stage network modeling.
To advance the practicability of denoising algorithms, this paper proposes a
novel single-stage blind real image denoising network (RIDNet) by employing a
modular architecture. We use a residual on the residual structure to ease the
flow of low-frequency information and apply feature attention to exploit the
channel dependencies. Furthermore, the evaluation in terms of quantitative
metrics and visual quality on three synthetic and four real noisy datasets
against 19 state-of-the-art algorithms demonstrate the superiority of our
RIDNet.Comment: Accepted in ICCV (Oral), 201
Learning Hybrid Sparsity Prior for Image Restoration: Where Deep Learning Meets Sparse Coding
State-of-the-art approaches toward image restoration can be classified into
model-based and learning-based. The former - best represented by sparse coding
techniques - strive to exploit intrinsic prior knowledge about the unknown
high-resolution images; while the latter - popularized by recently developed
deep learning techniques - leverage external image prior from some training
dataset. It is natural to explore their middle ground and pursue a hybrid image
prior capable of achieving the best in both worlds. In this paper, we propose a
systematic approach of achieving this goal called Structured Analysis Sparse
Coding (SASC). Specifically, a structured sparse prior is learned from
extrinsic training data via a deep convolutional neural network (in a similar
way to previous learning-based approaches); meantime another structured sparse
prior is internally estimated from the input observation image (similar to
previous model-based approaches). Two structured sparse priors will then be
combined to produce a hybrid prior incorporating the knowledge from both
domains. To manage the computational complexity, we have developed a novel
framework of implementing hybrid structured sparse coding processes by deep
convolutional neural networks. Experimental results show that the proposed
hybrid image restoration method performs comparably with and often better than
the current state-of-the-art techniques
- …