2,425 research outputs found
New Techniques for Preserving Global Structure and Denoising with Low Information Loss in Single-Image Super-Resolution
This work identifies and addresses two important technical challenges in
single-image super-resolution: (1) how to upsample an image without magnifying
noise and (2) how to preserve large scale structure when upsampling. We
summarize the techniques we developed for our second place entry in Track 1
(Bicubic Downsampling), seventh place entry in Track 2 (Realistic Adverse
Conditions), and seventh place entry in Track 3 (Realistic difficult) in the
2018 NTIRE Super-Resolution Challenge. Furthermore, we present new neural
network architectures that specifically address the two challenges listed
above: denoising and preservation of large-scale structure.Comment: 8 pages, CVPR workshop 201
Learning Digital Camera Pipeline for Extreme Low-Light Imaging
In low-light conditions, a conventional camera imaging pipeline produces
sub-optimal images that are usually dark and noisy due to a low photon count
and low signal-to-noise ratio (SNR). We present a data-driven approach that
learns the desired properties of well-exposed images and reflects them in
images that are captured in extremely low ambient light environments, thereby
significantly improving the visual quality of these low-light images. We
propose a new loss function that exploits the characteristics of both
pixel-wise and perceptual metrics, enabling our deep neural network to learn
the camera processing pipeline to transform the short-exposure, low-light RAW
sensor data to well-exposed sRGB images. The results show that our method
outperforms the state-of-the-art according to psychophysical tests as well as
pixel-wise standard metrics and recent learning-based perceptual image quality
measures
Handheld Multi-Frame Super-Resolution
Compared to DSLR cameras, smartphone cameras have smaller sensors, which
limits their spatial resolution; smaller apertures, which limits their light
gathering ability; and smaller pixels, which reduces their signal-to noise
ratio. The use of color filter arrays (CFAs) requires demosaicing, which
further degrades resolution. In this paper, we supplant the use of traditional
demosaicing in single-frame and burst photography pipelines with a multiframe
super-resolution algorithm that creates a complete RGB image directly from a
burst of CFA raw images. We harness natural hand tremor, typical in handheld
photography, to acquire a burst of raw frames with small offsets. These frames
are then aligned and merged to form a single image with red, green, and blue
values at every pixel site. This approach, which includes no explicit
demosaicing step, serves to both increase image resolution and boost signal to
noise ratio. Our algorithm is robust to challenging scene conditions: local
motion, occlusion, or scene changes. It runs at 100 milliseconds per
12-megapixel RAW input burst frame on mass-produced mobile phones.
Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well
as the default merge method in Night Sight mode (whether zooming or not) on
Google's flagship phone.Comment: 24 pages, accepted to Siggraph 2019 Technical Papers progra
Real Image Denoising with Feature Attention
Deep convolutional neural networks perform better on images containing
spatially invariant noise (synthetic noise); however, their performance is
limited on real-noisy photographs and requires multiple stage network modeling.
To advance the practicability of denoising algorithms, this paper proposes a
novel single-stage blind real image denoising network (RIDNet) by employing a
modular architecture. We use a residual on the residual structure to ease the
flow of low-frequency information and apply feature attention to exploit the
channel dependencies. Furthermore, the evaluation in terms of quantitative
metrics and visual quality on three synthetic and four real noisy datasets
against 19 state-of-the-art algorithms demonstrate the superiority of our
RIDNet.Comment: Accepted in ICCV (Oral), 201
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering
Motivated by augmented and virtual reality applications such as telepresence,
there has been a recent focus in real-time performance capture of humans under
motion. However, given the real-time constraint, these systems often suffer
from artifacts in geometry and texture such as holes and noise in the final
rendering, poor lighting, and low-resolution textures. We take the novel
approach to augment such real-time performance capture systems with a deep
architecture that takes a rendering from an arbitrary viewpoint, and jointly
performs completion, super resolution, and denoising of the imagery in
real-time. We call this approach neural (re-)rendering, and our live system
"LookinGood". Our deep architecture is trained to produce high resolution and
high quality images from a coarse rendering in real-time. First, we propose a
self-supervised training method that does not require manual ground-truth
annotation. We contribute a specialized reconstruction error that uses semantic
information to focus on relevant parts of the subject, e.g. the face. We also
introduce a salient reweighing scheme of the loss function that is able to
discard outliers. We specifically design the system for virtual and augmented
reality headsets where the consistency between the left and right eye plays a
crucial role in the final user experience. Finally, we generate temporally
stable results by explicitly minimizing the difference between two consecutive
frames. We tested the proposed system in two different scenarios: one involving
a single RGB-D sensor, and upper body reconstruction of an actor, the second
consisting of full body 360 degree capture. Through extensive experimentation,
we demonstrate how our system generalizes across unseen sequences and subjects.
The supplementary video is available at http://youtu.be/Md3tdAKoLGU.Comment: The supplementary video is available at: http://youtu.be/Md3tdAKoLGU
To be presented at SIGGRAPH Asia 201
Deep Convolutional Framelet Denosing for Low-Dose CT via Wavelet Residual Network
Model based iterative reconstruction (MBIR) algorithms for low-dose X-ray CT
are computationally expensive. To address this problem, we recently proposed a
deep convolutional neural network (CNN) for low-dose X-ray CT and won the
second place in 2016 AAPM Low-Dose CT Grand Challenge. However, some of the
texture were not fully recovered. To address this problem, here we propose a
novel framelet-based denoising algorithm using wavelet residual network which
synergistically combines the expressive power of deep learning and the
performance guarantee from the framelet-based denoising algorithms. The new
algorithms were inspired by the recent interpretation of the deep convolutional
neural network (CNN) as a cascaded convolution framelet signal representation.
Extensive experimental results confirm that the proposed networks have
significantly improved performance and preserves the detail texture of the
original images.Comment: This will appear in IEEE Transaction on Medical Imaging, a special
issue of Machine Learning for Image Reconstructio
Chaining Identity Mapping Modules for Image Denoising
We propose to learn a fully-convolutional network model that consists of a
Chain of Identity Mapping Modules (CIMM) for image denoising. The CIMM
structure possesses two distinctive features that are important for the noise
removal task. Firstly, each residual unit employs identity mappings as the skip
connections and receives pre-activated input in order to preserve the gradient
magnitude propagated in both the forward and backward directions. Secondly, by
utilizing dilated kernels for the convolution layers in the residual branch, in
other words within an identity mapping module, each neuron in the last
convolution layer can observe the full receptive field of the first layer.
After being trained on the BSD400 dataset, the proposed network produces
remarkably higher numerical accuracy and better visual image quality than the
state-of-the-art when being evaluated on conventional benchmark images and the
BSD68 dataset
Face Hallucination using Linear Models of Coupled Sparse Support
Most face super-resolution methods assume that low-resolution and
high-resolution manifolds have similar local geometrical structure, hence learn
local models on the lowresolution manifolds (e.g. sparse or locally linear
embedding models), which are then applied on the high-resolution manifold.
However, the low-resolution manifold is distorted by the oneto-many
relationship between low- and high- resolution patches. This paper presents a
method which learns linear models based on the local geometrical structure on
the high-resolution manifold rather than on the low-resolution manifold. For
this, in a first step, the low-resolution patch is used to derive a globally
optimal estimate of the high-resolution patch. The approximated solution is
shown to be close in Euclidean space to the ground-truth but is generally
smooth and lacks the texture details needed by state-ofthe-art face
recognizers. This first estimate allows us to find the support of the
high-resolution manifold using sparse coding (SC), which are then used as
support for learning a local projection (or upscaling) model between the
low-resolution and the highresolution manifolds using Multivariate Ridge
Regression (MRR). Experimental results show that the proposed method
outperforms six face super-resolution methods in terms of both recognition and
quality. These results also reveal that the recognition and quality are
significantly affected by the method used for stitching all super-resolved
patches together, where quilting was found to better preserve the texture
details which helps to achieve higher recognition rates
Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning
Single image super resolution (SR), which refers to reconstruct a
higher-resolution (HR) image from the observed low-resolution (LR) image, has
received substantial attention due to its tremendous application potentials.
Despite the breakthroughs of recently proposed SR methods using convolutional
neural networks (CNNs), their generated results usually lack of preserving
structural (high-frequency) details. In this paper, regarding global boundary
context and residual context as complimentary information for enhancing
structural details in image restoration, we develop a contextualized multi-task
learning framework to address the SR problem. Specifically, our method first
extracts convolutional features from the input LR image and applies one
deconvolutional module to interpolate the LR feature maps in a content-adaptive
way. Then, the resulting feature maps are fed into two branched sub-networks.
During the neural network training, one sub-network outputs salient image
boundaries and the HR image, and the other sub-network outputs the local
residual map, i.e., the residual difference between the generated HR image and
ground-truth image. On several standard benchmarks (i.e., Set5, Set14 and
BSD200), our extensive evaluations demonstrate the effectiveness of our SR
method on achieving both higher restoration quality and computational
efficiency compared with several state-of-the-art SR approaches. The source
code and some SR results can be found at:
http://hcp.sysu.edu.cn/structure-preserving-image-super-resolution/Comment: To appear in Transactions on Multimedia 201
- …