1,009 research outputs found
Single Image Super-Resolution Using Multi-Scale Convolutional Neural Network
Methods based on convolutional neural network (CNN) have demonstrated
tremendous improvements on single image super-resolution. However, the previous
methods mainly restore images from one single area in the low resolution (LR)
input, which limits the flexibility of models to infer various scales of
details for high resolution (HR) output. Moreover, most of them train a
specific model for each up-scale factor. In this paper, we propose a
multi-scale super resolution (MSSR) network. Our network consists of
multi-scale paths to make the HR inference, which can learn to synthesize
features from different scales. This property helps reconstruct various kinds
of regions in HR images. In addition, only one single model is needed for
multiple up-scale factors, which is more efficient without loss of restoration
quality. Experiments on four public datasets demonstrate that the proposed
method achieved state-of-the-art performance with fast speed
Non-local Neural Networks
Both convolutional and recurrent operations are building blocks that process
one local neighborhood at a time. In this paper, we present non-local
operations as a generic family of building blocks for capturing long-range
dependencies. Inspired by the classical non-local means method in computer
vision, our non-local operation computes the response at a position as a
weighted sum of the features at all positions. This building block can be
plugged into many computer vision architectures. On the task of video
classification, even without any bells and whistles, our non-local models can
compete or outperform current competition winners on both Kinetics and Charades
datasets. In static image recognition, our non-local models improve object
detection/segmentation and pose estimation on the COCO suite of tasks. Code is
available at https://github.com/facebookresearch/video-nonlocal-net .Comment: CVPR 2018, code is available at:
https://github.com/facebookresearch/video-nonlocal-ne
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
The Unreasonable Effectiveness of Texture Transfer for Single Image Super-resolution
While implicit generative models such as GANs have shown impressive results
in high quality image reconstruction and manipulation using a combination of
various losses, we consider a simpler approach leading to surprisingly strong
results. We show that texture loss alone allows the generation of perceptually
high quality images. We provide a better understanding of texture constraining
mechanism and develop a novel semantically guided texture constraining method
for further improvement. Using a recently developed perceptual metric employing
"deep features" and termed LPIPS, the method obtains state-of-the-art results.
Moreover, we show that a texture representation of those deep features better
capture the perceptual quality of an image than the original deep features.
Using texture information, off-the-shelf deep classification networks (without
training) perform as well as the best performing (tuned and calibrated) LPIPS
metrics. The code is publicly available.Comment: 19 pages, 14 figure
- …