6,718 research outputs found
Image Super-Resolution via Deterministic-Stochastic Synthesis and Local Statistical Rectification
Single image superresolution has been a popular research topic in the last
two decades and has recently received a new wave of interest due to deep neural
networks. In this paper, we approach this problem from a different perspective.
With respect to a downsampled low resolution image, we model a high resolution
image as a combination of two components, a deterministic component and a
stochastic component. The deterministic component can be recovered from the
low-frequency signals in the downsampled image. The stochastic component, on
the other hand, contains the signals that have little correlation with the low
resolution image. We adopt two complementary methods for generating these two
components. While generative adversarial networks are used for the stochastic
component, deterministic component reconstruction is formulated as a regression
problem solved using deep neural networks. Since the deterministic component
exhibits clearer local orientations, we design novel loss functions tailored
for such properties for training the deep regression network. These two methods
are first applied to the entire input image to produce two distinct
high-resolution images. Afterwards, these two images are fused together using
another deep neural network that also performs local statistical rectification,
which tries to make the local statistics of the fused image match the same
local statistics of the groundtruth image. Quantitative results and a user
study indicate that the proposed method outperforms existing state-of-the-art
algorithms with a clear margin.Comment: to appear in SIGGRAPH Asia 201
Generic 3D Convolutional Fusion for image restoration
Also recently, exciting strides forward have been made in the area of image
restoration, particularly for image denoising and single image
super-resolution. Deep learning techniques contributed to this significantly.
The top methods differ in their formulations and assumptions, so even if their
average performance may be similar, some work better on certain image types and
image regions than others. This complementarity motivated us to propose a novel
3D convolutional fusion (3DCF) method. Unlike other methods adapted to
different tasks, our method uses the exact same convolutional network
architecture to address both image denois- ing and single image
super-resolution. As a result, our 3DCF method achieves substantial
improvements (0.1dB-0.4dB PSNR) over the state-of-the-art methods that it
fuses, and this on standard benchmarks for both tasks. At the same time, the
method still is computationally efficient
NTIRE 2020 Challenge on Spectral Reconstruction from an RGB Image
This paper reviews the second challenge on spectral reconstruction from RGB
images, i.e., the recovery of whole-scene hyperspectral (HS) information from a
3-channel RGB image. As in the previous challenge, two tracks were provided:
(i) a "Clean" track where HS images are estimated from noise-free RGBs, the RGB
images are themselves calculated numerically using the ground-truth HS images
and supplied spectral sensitivity functions (ii) a "Real World" track,
simulating capture by an uncalibrated and unknown camera, where the HS images
are recovered from noisy JPEG-compressed RGB images. A new, larger-than-ever,
natural hyperspectral image data set is presented, containing a total of 510 HS
images. The Clean and Real World tracks had 103 and 78 registered participants
respectively, with 14 teams competing in the final testing phase. A description
of the proposed methods, alongside their challenge scores and an extensive
evaluation of top performing methods is also provided. They gauge the
state-of-the-art in spectral reconstruction from an RGB image
Deep Inception Generative Network for Cognitive Image Inpainting
Recent advances in deep learning have shown exciting promise in filling large
holes and lead to another orientation for image inpainting. However, existing
learning-based methods often create artifacts and fallacious textures because
of insufficient cognition understanding. Previous generative networks are
limited with single receptive type and give up pooling in consideration of
detail sharpness. Human cognition is constant regardless of the target
attribute. As multiple receptive fields improve the ability of abstract image
characterization and pooling can keep feature invariant, specifically, deep
inception learning is adopted to promote high-level feature representation and
enhance model learning capacity for local patches. Moreover, approaches for
generating diverse mask images are introduced and a random mask dataset is
created. We benchmark our methods on ImageNet, Places2 dataset, and CelebA-HQ.
Experiments for regular, irregular, and custom regions completion are all
performed and free-style image inpainting is also presented. Quantitative
comparisons with previous state-of-the-art methods show that ours obtain much
more natural image completions
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
Deep feature fusion for self-supervised monocular depth prediction
Recent advances in end-to-end unsupervised learning has significantly
improved the performance of monocular depth prediction and alleviated the
requirement of ground truth depth. Although a plethora of work has been done in
enforcing various structural constraints by incorporating multiple losses
utilising smoothness, left-right consistency, regularisation and matching
surface normals, a few of them take into consideration multi-scale structures
present in real world images. Most works utilise a VGG16 or ResNet50 model
pre-trained on ImageNet weights for predicting depth. We propose a deep feature
fusion method utilising features at multiple scales for learning
self-supervised depth from scratch. Our fusion network selects features from
both upper and lower levels at every level in the encoder network, thereby
creating multiple feature pyramid sub-networks that are fed to the decoder
after applying the CoordConv solution. We also propose a refinement module
learning higher scale residual depth from a combination of higher level deep
features and lower level residual depth using a pixel shuffling framework that
super-resolves lower level residual depth. We select the KITTI dataset for
evaluation and show that our proposed architecture can produce better or
comparable results in depth prediction.Comment: 4 pages, 2 Tables, 2 Figure
IEGAN: Multi-purpose Perceptual Quality Image Enhancement Using Generative Adversarial Network
Despite the breakthroughs in quality of image enhancement, an end-to-end
solution for simultaneous recovery of the finer texture details and sharpness
for degraded images with low resolution is still unsolved. Some existing
approaches focus on minimizing the pixel-wise reconstruction error which
results in a high peak signal-to-noise ratio. The enhanced images fail to
provide high-frequency details and are perceptually unsatisfying, i.e., they
fail to match the quality expected in a photo-realistic image. In this paper,
we present Image Enhancement Generative Adversarial Network (IEGAN), a
versatile framework capable of inferring photo-realistic natural images for
both artifact removal and super-resolution simultaneously. Moreover, we propose
a new loss function consisting of a combination of reconstruction loss, feature
loss and an edge loss counterpart. The feature loss helps to push the output
image to the natural image manifold and the edge loss preserves the sharpness
of the output image. The reconstruction loss provides low-level semantic
information to the generator regarding the quality of the generated images
compared to the original. Our approach has been experimentally proven to
recover photo-realistic textures from heavily compressed low-resolution images
on public benchmarks and our proposed high-resolution World100 dataset.Comment: Accepted at IEEE WACV 201
Learned Spectral Super-Resolution
We describe a novel method for blind, single-image spectral super-resolution.
While conventional super-resolution aims to increase the spatial resolution of
an input image, our goal is to spectrally enhance the input, i.e., generate an
image with the same spatial resolution, but a greatly increased number of
narrow (hyper-spectral) wave-length bands. Just like the spatial statistics of
natural images has rich structure, which one can exploit as prior to predict
high-frequency content from a low resolution image, the same is also true in
the spectral domain: the materials and lighting conditions of the observed
world induce structure in the spectrum of wavelengths observed at a given
pixel. Surprisingly, very little work exists that attempts to use this
diagnosis and achieve blind spectral super-resolution from single images. We
start from the conjecture that, just like in the spatial domain, we can learn
the statistics of natural image spectra, and with its help generate finely
resolved hyper-spectral images from RGB input. Technically, we follow the
current best practice and implement a convolutional neural network (CNN), which
is trained to carry out the end-to-end mapping from an entire RGB image to the
corresponding hyperspectral image of equal size. We demonstrate spectral
super-resolution both for conventional RGB images and for multi-spectral
satellite data, outperforming the state-of-the-art.Comment: Submitted to ICCV 2017 (10 pages, 8 figures
Deep Learned Frame Prediction for Video Compression
Motion compensation is one of the most essential methods for any video
compression algorithm. Video frame prediction is a task analogous to motion
compensation. In recent years, the task of frame prediction is undertaken by
deep neural networks (DNNs). In this thesis we create a DNN to perform learned
frame prediction and additionally implement a codec that contains our DNN. We
train our network using two methods for two different goals. Firstly we train
our network based on mean square error (MSE) only, aiming to obtain highest
PSNR values at frame prediction and video compression. Secondly we use
adversarial training to produce visually more realistic frame predictions. For
frame prediction, we compare our method with the baseline methods of frame
difference and 16x16 block motion compensation. For video compression we
further include x264 video codec in the comparison. We show that in frame
prediction, adversarial training produces frames that look sharper and more
realistic, compared MSE based training, but in video compression it
consistently performs worse. This proves that even though adversarial training
is useful for generating video frames that are more pleasing to the human eye,
they should not be employed for video compression. Moreover, our network
trained with MSE produces accurate frame predictions, and in quantitative
results, for both tasks, it produces comparable results in all videos and
outperforms other methods on average. More specifically, learned frame
prediction outperforms other methods in terms of rate-distortion performance in
case of high motion video, while the rate-distortion performance of our method
is competitive with that of x264 in low motion video
Learning Deep Convolutional Networks for Demosaicing
This paper presents a comprehensive study of applying the convolutional
neural network (CNN) to solving the demosaicing problem. The paper presents two
CNN models that learn end-to-end mappings between the mosaic samples and the
original image patches with full information. In the case the Bayer color
filter array (CFA) is used, an evaluation with ten competitive methods on
popular benchmarks confirms that the data-driven, automatically learned
features by the CNN models are very effective. Experiments show that the
proposed CNN models can perform equally well in both the sRGB space and the
linear space. It is also demonstrated that the CNN model can perform joint
denoising and demosaicing. The CNN model is very flexible and can be easily
adopted for demosaicing with any CFA design. We train CNN models for
demosaicing with three different CFAs and obtain better results than existing
methods. With the great flexibility to be coupled with any CFA, we present the
first data-driven joint optimization of the CFA design and the demosaicing
method using CNN. Experiments show that the combination of the automatically
discovered CFA pattern and the automatically devised demosaicing method
significantly outperforms the current best demosaicing results. Visual
comparisons confirm that the proposed methods reduce more visual artifacts than
existing methods. Finally, we show that the CNN model is also effective for the
more general demosaicing problem with spatially varying exposure and color and
can be used for taking images of higher dynamic ranges with a single shot. The
proposed models and the thorough experiments together demonstrate that CNN is
an effective and versatile tool for solving the demosaicing problem
- …