45,689 research outputs found
UG Track 2: A Collective Benchmark Effort for Evaluating and Advancing Image Understanding in Poor Visibility Environments
The UG challenge in IEEE CVPR 2019 aims to evoke a comprehensive
discussion and exploration about how low-level vision techniques can benefit
the high-level automatic visual recognition in various scenarios. In its second
track, we focus on object or face detection in poor visibility enhancements
caused by bad weathers (haze, rain) and low light conditions. While existing
enhancement methods are empirically expected to help the high-level end task,
that is observed to not always be the case in practice. To provide a more
thorough examination and fair comparison, we introduce three benchmark sets
collected in real-world hazy, rainy, and low-light conditions, respectively,
with annotate objects/faces annotated. To our best knowledge, this is the first
and currently largest effort of its kind. Baseline results by cascading
existing enhancement and detection models are reported, indicating the highly
challenging nature of our new data as well as the large room for further
technical innovations. We expect a large participation from the broad research
community to address these challenges together.Comment: A summary paper on datasets, fact sheets, baseline results, challenge
results, and winning methods in UG Challenge (Track 2). More materials
are provided in http://www.ug2challenge.org/index.htm
Bridging the Gap Between Computational Photography and Visual Recognition
What is the current state-of-the-art for image restoration and enhancement
applied to degraded images acquired under less than ideal circumstances? Can
the application of such algorithms as a pre-processing step to improve image
interpretability for manual analysis or automatic visual recognition to
classify scene content? While there have been important advances in the area of
computational photography to restore or enhance the visual quality of an image,
the capabilities of such techniques have not always translated in a useful way
to visual recognition tasks. Consequently, there is a pressing need for the
development of algorithms that are designed for the joint problem of improving
visual appearance and recognition, which will be an enabling factor for the
deployment of visual recognition tools in many real-world scenarios. To address
this, we introduce the UG^2 dataset as a large-scale benchmark composed of
video imagery captured under challenging conditions, and two enhancement tasks
designed to test algorithmic impact on visual quality and automatic object
recognition. Furthermore, we propose a set of metrics to evaluate the joint
improvement of such tasks as well as individual algorithmic advances, including
a novel psychophysics-based evaluation regime for human assessment and a
realistic set of quantitative measures for object recognition performance. We
introduce six new algorithms for image restoration or enhancement, which were
created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR
2018. Under the proposed evaluation regime, we present an in-depth analysis of
these algorithms and a host of deep learning-based and classic baseline
approaches. From the observed results, it is evident that we are in the early
days of building a bridge between computational photography and visual
recognition, leaving many opportunities for innovation in this area.Comment: CVPR Prize Challenge: http://www.ug2challenge.or
Extreme Low-Light Imaging with Multi-granulation Cooperative Networks
Low-light imaging is challenging since images may appear to be dark and
noised due to low signal-to-noise ratio, complex image content, and the variety
in shooting scenes in extreme low-light condition. Many methods have been
proposed to enhance the imaging quality under extreme low-light conditions, but
it remains difficult to obtain satisfactory results, especially when they
attempt to retain high dynamic range (HDR). In this paper, we propose a novel
method of multi-granulation cooperative networks (MCN) with bidirectional
information flow to enhance extreme low-light images, and design an
illumination map estimation function (IMEF) to preserve high dynamic range
(HDR). To facilitate this research, we also contribute to create a new
benchmark dataset of real-world Dark High Dynamic Range (DHDR) images to
evaluate the performance of high dynamic preservation in low light environment.
Experimental results show that the proposed method outperforms the
state-of-the-art approaches in terms of both visual effects and quantitative
analysis
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem that aims to
generate a high-resolution (HR) face image from a low-resolution~(LR) input. In
contrast to the existing patch-wise super-resolution models that divide a face
image into regular patches and independently apply LR to HR mapping to each
patch, we implement deep reinforcement learning and develop a novel
attention-aware face hallucination (Attention-FH) framework, which recurrently
learns to attend a sequence of patches and performs facial part enhancement by
fully exploiting the global interdependency of the image. Specifically, our
proposed framework incorporates two components: a recurrent policy network for
dynamically specifying a new attended region at each time step based on the
status of the super-resolved image and the past attended region sequence, and a
local enhancement network for selected patch hallucination and global state
updating. The Attention-FH model jointly learns the recurrent policy network
and local enhancement network through maximizing a long-term reward that
reflects the hallucination result with respect to the whole HR image. Extensive
experiments demonstrate that our Attention-FH significantly outperforms the
state-of-the-art methods on in-the-wild face images with large pose and
illumination variations.Comment: To be published in TPAM
ResDepth: Learned Residual Stereo Reconstruction
We propose an embarrassingly simple but very effective scheme for
high-quality dense stereo reconstruction: (i) generate an approximate
reconstruction with your favourite stereo matcher; (ii) rewarp the input images
with that approximate model; (iii) with the initial reconstruction and the
warped images as input, train a deep network to enhance the reconstruction by
regressing a residual correction; and (iv) if desired, iterate the refinement
with the new, improved reconstruction. The strategy to only learn the residual
greatly simplifies the learning problem. A standard Unet without bells and
whistles is enough to reconstruct even small surface details, like dormers and
roof substructures in satellite images. We also investigate residual
reconstruction with less information and find that even a single image is
enough to greatly improve an approximate reconstruction. Our full model reduces
the mean absolute error of state-of-the-art stereo reconstruction systems by
>50%, both in our target domain of satellite stereo and on stereo pairs from
the ETH3D benchmark.Comment: updated supplementary materia
Baseline CNN structure analysis for facial expression recognition
We present a baseline convolutional neural network (CNN) structure and image
preprocessing methodology to improve facial expression recognition algorithm
using CNN. To analyze the most efficient network structure, we investigated
four network structures that are known to show good performance in facial
expression recognition. Moreover, we also investigated the effect of input
image preprocessing methods. Five types of data input (raw, histogram
equalization, isotropic smoothing, diffusion-based normalization, difference of
Gaussian) were tested, and the accuracy was compared. We trained 20 different
CNN models (4 networks x 5 data input types) and verified the performance of
each network with test images from five different databases. The experiment
result showed that a three-layer structure consisting of a simple convolutional
and a max pooling layer with histogram equalization image input was the most
efficient. We describe the detailed training procedure and analyze the result
of the test accuracy based on considerable observation.Comment: 6 pages, RO-MAN2016 Conferenc
Attention-Aware Face Hallucination via Deep Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem with the
goal to generate high-resolution (HR) faces from low-resolution (LR) input
images. In contrast to existing methods that often learn a single
patch-to-patch mapping from LR to HR images and are regardless of the
contextual interdependency between patches, we propose a novel Attention-aware
Face Hallucination (Attention-FH) framework which resorts to deep reinforcement
learning for sequentially discovering attended patches and then performing the
facial part enhancement by fully exploiting the global interdependency of the
image. Specifically, in each time step, the recurrent policy network is
proposed to dynamically specify a new attended region by incorporating what
happened in the past. The state (i.e., face hallucination result for the whole
image) can thus be exploited and updated by the local enhancement network on
the selected region. The Attention-FH approach jointly learns the recurrent
policy network and local enhancement network through maximizing the long-term
reward that reflects the hallucination performance over the whole image.
Therefore, our proposed Attention-FH is capable of adaptively personalizing an
optimal searching path for each face image according to its own characteristic.
Extensive experiments show our approach significantly surpasses the
state-of-the-arts on in-the-wild faces with large pose and illumination
variations
PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report
This paper reviews the first challenge on efficient perceptual image
enhancement with the focus on deploying deep learning models on smartphones.
The challenge consisted of two tracks. In the first one, participants were
solving the classical image super-resolution problem with a bicubic downscaling
factor of 4. The second track was aimed at real-world photo enhancement, and
the goal was to map low-quality photos from the iPhone 3GS device to the same
photos captured with a DSLR camera. The target metric used in this challenge
combined the runtime, PSNR scores and solutions' perceptual results measured in
the user study. To ensure the efficiency of the submitted models, we
additionally measured their runtime and memory requirements on Android
smartphones. The proposed solutions significantly improved baseline results
defining the state-of-the-art for image enhancement on smartphones
Face Image Reflection Removal
Face images captured through the glass are usually contaminated by
reflections. The non-transmitted reflections make the reflection removal more
challenging than for general scenes, because important facial features are
completely occluded. In this paper, we propose and solve the face image
reflection removal problem. We remove non-transmitted reflections by
incorporating inpainting ideas into a guided reflection removal framework and
recover facial features by considering various face-specific priors. We use a
newly collected face reflection image dataset to train our model and compare
with state-of-the-art methods. The proposed method shows advantages in
estimating reflection-free face images for improving face recognition
- …