3,614 research outputs found
NTIRE 2020 Challenge on NonHomogeneous Dehazing
This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of
images (restoration of rich details in hazy image). We focus on the proposed
solutions and their results evaluated on NH-Haze, a novel dataset consisting of
55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor.
NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground
truth images. The nonhomogeneous haze has been produced using a professional
haze generator that imitates the real conditions of haze scenes. 168
participants registered in the challenge and 27 teams competed in the final
testing phase. The proposed solutions gauge the state-of-the-art in image
dehazing.Comment: CVPR Workshops Proceedings 202
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
A Matrix-in-matrix Neural Network for Image Super Resolution
In recent years, deep learning methods have achieved impressive results with
higher peak signal-to-noise ratio in single image super-resolution (SISR) tasks
by utilizing deeper layers. However, their application is quite limited since
they require high computing power. In addition, most of the existing methods
rarely take full advantage of the intermediate features which are helpful for
restoration. To address these issues, we propose a moderate-size SISR net work
named matrixed channel attention network (MCAN) by constructing a matrix
ensemble of multi-connected channel attention blocks (MCAB). Several models of
different sizes are released to meet various practical requirements.
Conclusions can be drawn from our extensive benchmark experiments that the
proposed models achieve better performance with much fewer multiply-adds and
parameters. Our models will be made publicly available
Learning for Video Super-Resolution through HR Optical Flow Estimation
Video super-resolution (SR) aims to generate a sequence of high-resolution
(HR) frames with plausible and temporally consistent details from their
low-resolution (LR) counterparts. The generation of accurate correspondence
plays a significant role in video SR. It is demonstrated by traditional video
SR methods that simultaneous SR of both images and optical flows can provide
accurate correspondences and better SR results. However, LR optical flows are
used in existing deep learning based methods for correspondence generation. In
this paper, we propose an end-to-end trainable video SR framework to
super-resolve both images and optical flows. Specifically, we first propose an
optical flow reconstruction network (OFRnet) to infer HR optical flows in a
coarse-to-fine manner. Then, motion compensation is performed according to the
HR optical flows. Finally, compensated LR inputs are fed to a super-resolution
network (SRnet) to generate the SR results. Extensive experiments demonstrate
that HR optical flows provide more accurate correspondences than their LR
counterparts and improve both accuracy and consistency performance. Comparative
results on the Vid4 and DAVIS-10 datasets show that our framework achieves the
state-of-the-art performance.Comment: To appear in ACCV 201
Rain O'er Me: Synthesizing real rain to derain with data distillation
We present a supervised technique for learning to remove rain from images
without using synthetic rain software. The method is based on a two-stage data
distillation approach: 1) A rainy image is first paired with a coarsely
derained version using on a simple filtering technique ("rain-to-clean"). 2)
Then a clean image is randomly matched with the rainy soft-labeled pair.
Through a shared deep neural network, the rain that is removed from the first
image is then added to the clean image to generate a second pair
("clean-to-rain"). The neural network simultaneously learns to map both images
such that high resolution structure in the clean images can inform the
deraining of the rainy images. Demonstrations show that this approach can
address those visual characteristics of rain not easily synthesized by software
in the usual way
Learning monocular depth estimation infusing traditional stereo knowledge
Depth estimation from a single image represents a fascinating, yet
challenging problem with countless applications. Recent works proved that this
task could be learned without direct supervision from ground truth labels
leveraging image synthesis on sequences or stereo pairs. Focusing on this
second case, in this paper we leverage stereo matching in order to improve
monocular depth estimation. To this aim we propose monoResMatch, a novel deep
architecture designed to infer depth from a single input image by synthesizing
features from a different point of view, horizontally aligned with the input
image, performing stereo matching between the two cues. In contrast to previous
works sharing this rationale, our network is the first trained end-to-end from
scratch. Moreover, we show how obtaining proxy ground truth annotation through
traditional stereo algorithms, such as Semi-Global Matching, enables more
accurate monocular depth estimation still countering the need for expensive
depth labels by keeping a self-supervised approach. Exhaustive experimental
results prove how the synergy between i) the proposed monoResMatch architecture
and ii) proxy-supervision attains state-of-the-art for self-supervised
monocular depth estimation. The code is publicly available at
https://github.com/fabiotosi92/monoResMatch-Tensorflow.Comment: accepted at CVPR 2019. Code available at
https://github.com/fabiotosi92/monoResMatch-Tensorflo
Knowledge Adaptation for Efficient Semantic Segmentation
Both accuracy and efficiency are of significant importance to the task of
semantic segmentation. Existing deep FCNs suffer from heavy computations due to
a series of high-resolution feature maps for preserving the detailed knowledge
in dense estimation. Although reducing the feature map resolution (i.e.,
applying a large overall stride) via subsampling operations (e.g., pooling and
convolution striding) can instantly increase the efficiency, it dramatically
decreases the estimation accuracy. To tackle this dilemma, we propose a
knowledge distillation method tailored for semantic segmentation to improve the
performance of the compact FCNs with large overall stride. To handle the
inconsistency between the features of the student and teacher network, we
optimize the feature similarity in a transferred latent domain formulated by
utilizing a pre-trained autoencoder. Moreover, an affinity distillation module
is proposed to capture the long-range dependency by calculating the non-local
interactions across the whole image. To validate the effectiveness of our
proposed method, extensive experiments have been conducted on three popular
benchmarks: Pascal VOC, Cityscapes and Pascal Context. Built upon a highly
competitive baseline, our proposed method can improve the performance of a
student network by 2.5\% (mIOU boosts from 70.2 to 72.7 on the cityscapes test
set) and can train a better compact model with only 8\% float operations
(FLOPS) of a model that achieves comparable performances.Comment: Accepted to IEEE Conf. Computer Vision and Pattern Recognition, 201
MDCN: Multi-scale Dense Cross Network for Image Super-Resolution
Convolutional neural networks have been proven to be of great benefit for
single-image super-resolution (SISR). However, previous works do not make full
use of multi-scale features and ignore the inter-scale correlation between
different upsampling factors, resulting in sub-optimal performance. Instead of
blindly increasing the depth of the network, we are committed to mining image
features and learning the inter-scale correlation between different upsampling
factors. To achieve this, we propose a Multi-scale Dense Cross Network (MDCN),
which achieves great performance with fewer parameters and less execution time.
MDCN consists of multi-scale dense cross blocks (MDCBs), hierarchical feature
distillation block (HFDB), and dynamic reconstruction block (DRB). Among them,
MDCB aims to detect multi-scale features and maximize the use of image features
flow at different scales, HFDB focuses on adaptively recalibrate channel-wise
feature responses to achieve feature distillation, and DRB attempts to
reconstruct SR images with different upsampling factors in a single model. It
is worth noting that all these modules can run independently. It means that
these modules can be selectively plugged into any CNN model to improve model
performance. Extensive experiments show that MDCN achieves competitive results
in SISR, especially in the reconstruction task with multiple upsampling
factors. The code will be provided at https://github.com/MIVRC/MDCN-PyTorch.Comment: 15 pages, 15 figure
Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution
Convolutional neural networks (CNNs) have recently achieved great success in
single-image super-resolution (SISR). However, these methods tend to produce
over-smoothed outputs and miss some textural details. To solve these problems,
we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high
resolution (HR) image with better textural details in the wavelet domain. The
proposed SRCliqueNet firstly extracts a set of feature maps from the low
resolution (LR) image by the clique blocks group. Then we send the set of
feature maps to the clique up-sampling module to reconstruct the HR image. The
clique up-sampling module consists of four sub-nets which predict the high
resolution wavelet coefficients of four sub-bands. Since we consider the edge
feature properties of four sub-bands, the four sub-nets are connected to the
others so that they can learn the coefficients of four sub-bands jointly.
Finally we apply inverse discrete wavelet transform (IDWT) to the output of
four sub-nets at the end of the clique up-sampling module to increase the
resolution and reconstruct the HR image. Extensive quantitative and qualitative
experiments on benchmark datasets show that our method achieves superior
performance over the state-of-the-art methods.Comment: Accepted in NIPS 201
Difficulty-aware Image Super Resolution via Deep Adaptive Dual-Network
Recently, deep learning based single image super-resolution(SR) approaches
have achieved great development. The state-of-the-art SR methods usually adopt
a feed-forward pipeline to establish a non-linear mapping between low-res(LR)
and high-res(HR) images. However, due to treating all image regions equally
without considering the difficulty diversity, these approaches meet an upper
bound for optimization. To address this issue, we propose a novel SR approach
that discriminately processes each image region within an image by its
difficulty. Specifically, we propose a dual-way SR network that one way is
trained to focus on easy image regions and another is trained to handle hard
image regions. To identify whether a region is easy or hard, we propose a novel
image difficulty recognition network based on PSNR prior. Our SR approach that
uses the region mask to adaptively enforce the dual-way SR network yields
superior results. Extensive experiments on several standard benchmarks (e.g.,
Set5, Set14, BSD100, and Urban100) show that our approach achieves
state-of-the-art performance.Comment: ICME2019(Oral), code and results are available at:
https://github.com/xzwlx/Difficulty-S
- …