507 research outputs found
Learn to Model Motion from Blurry Footages
It is difficult to recover the motion field from a real-world footage given a
mixture of camera shake and other photometric effects. In this paper we propose
a hybrid framework by interleaving a Convolutional Neural Network (CNN) and a
traditional optical flow energy. We first conduct a CNN architecture using a
novel learnable directional filtering layer. Such layer encodes the angle and
distance similarity matrix between blur and camera motion, which is able to
enhance the blur features of the camera-shake footages. The proposed CNNs are
then integrated into an iterative optical flow framework, which enable the
capability of modelling and solving both the blind deconvolution and the
optical flow estimation problems simultaneously. Our framework is trained
end-to-end on a synthetic dataset and yields competitive precision and
performance against the state-of-the-art approaches.Comment: Preprint of our paper accepted by Pattern Recognitio
Bridging the Gap Between Computational Photography and Visual Recognition
What is the current state-of-the-art for image restoration and enhancement
applied to degraded images acquired under less than ideal circumstances? Can
the application of such algorithms as a pre-processing step to improve image
interpretability for manual analysis or automatic visual recognition to
classify scene content? While there have been important advances in the area of
computational photography to restore or enhance the visual quality of an image,
the capabilities of such techniques have not always translated in a useful way
to visual recognition tasks. Consequently, there is a pressing need for the
development of algorithms that are designed for the joint problem of improving
visual appearance and recognition, which will be an enabling factor for the
deployment of visual recognition tools in many real-world scenarios. To address
this, we introduce the UG^2 dataset as a large-scale benchmark composed of
video imagery captured under challenging conditions, and two enhancement tasks
designed to test algorithmic impact on visual quality and automatic object
recognition. Furthermore, we propose a set of metrics to evaluate the joint
improvement of such tasks as well as individual algorithmic advances, including
a novel psychophysics-based evaluation regime for human assessment and a
realistic set of quantitative measures for object recognition performance. We
introduce six new algorithms for image restoration or enhancement, which were
created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR
2018. Under the proposed evaluation regime, we present an in-depth analysis of
these algorithms and a host of deep learning-based and classic baseline
approaches. From the observed results, it is evident that we are in the early
days of building a bridge between computational photography and visual
recognition, leaving many opportunities for innovation in this area.Comment: CVPR Prize Challenge: http://www.ug2challenge.or
Identifying Most Walkable Direction for Navigation in an Outdoor Environment
We present an approach for identifying the most walkable direction for
navigation using a hand-held camera. Our approach extracts semantically rich
contextual information from the scene using a custom encoder-decoder
architecture for semantic segmentation and models the spatial and temporal
behavior of objects in the scene using a spatio-temporal graph. The system
learns to minimize a cost function over the spatial and temporal object
attributes to identify the most walkable direction. We construct a new
annotated navigation dataset collected using a hand-held mobile camera in an
unconstrained outdoor environment, which includes challenging settings such as
highly dynamic scenes, occlusion between objects, and distortions. Our system
achieves an accuracy of 84% on predicting a safe direction. We also show that
our custom segmentation network is both fast and accurate, achieving mIOU (mean
intersection over union) scores of 81 and 44.7 on the PASCAL VOC and the PASCAL
Context datasets, respectively, while running at about 21 frames per second
Physics-Based Generative Adversarial Models for Image Restoration and Beyond
We present an algorithm to directly solve numerous image restoration problems
(e.g., image deblurring, image dehazing, image deraining, etc.). These problems
are highly ill-posed, and the common assumptions for existing methods are
usually based on heuristic image priors. In this paper, we find that these
problems can be solved by generative models with adversarial learning. However,
the basic formulation of generative adversarial networks (GANs) does not
generate realistic images, and some structures of the estimated images are
usually not preserved well. Motivated by an interesting observation that the
estimated results should be consistent with the observed inputs under the
physics models, we propose a physics model constrained learning algorithm so
that it can guide the estimation of the specific task in the conventional GAN
framework. The proposed algorithm is trained in an end-to-end fashion and can
be applied to a variety of image restoration and related low-level vision
problems. Extensive experiments demonstrate that our method performs favorably
against the state-of-the-art algorithms.Comment: IEEE TPAM
Image and Depth from a Single Defocused Image Using Coded Aperture Photography
Depth from defocus and defocus deblurring from a single image are two
challenging problems that are derived from the finite depth of field in
conventional cameras. Coded aperture imaging is one of the techniques that is
used for improving the results of these two problems. Up to now, different
methods have been proposed for improving the results of either defocus
deblurring or depth estimation. In this paper, a multi-objective function is
proposed for evaluating and designing aperture patterns with the aim of
improving the results of both depth from defocus and defocus deblurring.
Pattern evaluation is performed by considering the scene illumination condition
and camera system specification. Based on the proposed criteria, a single
asymmetric pattern is designed that is used for restoring a sharp image and a
depth map from a single input. Since the designed pattern is asymmetric,
defocus objects on the two sides of the focal plane can be distinguished. Depth
estimation is performed by using a new algorithm, which is based on image
quality assessment criteria and can distinguish between blurred objects lying
in front or behind the focal plane. Extensive simulations as well as
experiments on a variety of real scenes are conducted to compare our aperture
with previously proposed ones.Comment: 18 pages, 14 figures, submitte
Structural and object detection for phosphene images
Prosthetic vision based on phosphenes is a promising way to provide visual
perception to some blind people. However, phosphenic images are very limited in
terms of spatial resolution (e.g.: 32 x 32 phosphene array) and luminance
levels (e.g.: 8 gray levels), which results in the subject receiving very
limited information about the scene. This requires using high-level processing
to extract more information from the scene and present it to the subject with
the phosphenes limitations. In this work, we study the recognition of indoor
environments under simulated prosthetic vision. Most research in simulated
prosthetic vision is performed based on static images, while very few
researchers have addressed the problem of scene recognition through video
sequences. We propose a new approach to build a schematic representation of
indoor environments for phosphene images. Our schematic representation relies
on two parallel CNNs for the extraction of structural informative edges of the
room and the relevant object silhouettes based on mask segmentation. We have
performed a study with twelve normally sighted subjects to evaluate how our
methods were able to the room recognition by presenting phosphenic images and
videos. We show how our method is able to increase the recognition ability of
the user from 75% using alternative methods to 90% using our approach
Single Image Non-uniform Blur Kernel Estimation via Adaptive Basis Decomposition
Characterizing and removing motion blur caused by camera shake or object
motion remains an important task for image restoration. In recent years,
removal of motion blur in photographs has seen impressive progress in the hands
of deep learning-based methods, trained to map directly from blurry to sharp
images. Characterization of motion blur, on the other hand, has received less
attention and progress in model-based methods for restoration lags behind that
of data-driven end-to-end approaches. In this paper, we propose a general,
non-parametric model for dense non-uniform motion blur estimation. Given a
blurry image, we estimate a set of adaptive basis kernels as well as the mixing
coefficients at pixel level, producing a per-pixel map of motion blur. This
rich but efficient forward model of the degradation process allows the
utilization of existing tools for solving inverse problems. We show that our
method overcomes the limitations of existing non-uniform motion blur estimation
and that it contributes to bridging the gap between model-based and data-driven
approaches for deblurring real photographs
Scene Text Detection via Holistic, Multi-Channel Prediction
Recently, scene text detection has become an active research topic in
computer vision and document analysis, because of its great importance and
significant challenge. However, vast majority of the existing methods detect
text within local regions, typically through extracting character, word or line
level candidates followed by candidate aggregation and false positive
elimination, which potentially exclude the effect of wide-scope and long-range
contextual cues in the scene. To take full advantage of the rich information
available in the whole natural image, we propose to localize text in a holistic
manner, by casting scene text detection as a semantic segmentation problem. The
proposed algorithm directly runs on full images and produces global, pixel-wise
prediction maps, in which detections are subsequently formed. To better make
use of the properties of text, three types of information regarding text
region, individual characters and their relationship are estimated, with a
single Fully Convolutional Network (FCN) model. With such predictions of text
properties, the proposed algorithm can simultaneously handle horizontal,
multi-oriented and curved text in real-world natural images. The experiments on
standard benchmarks, including ICDAR 2013, ICDAR 2015 and MSRA-TD500,
demonstrate that the proposed algorithm substantially outperforms previous
state-of-the-art approaches. Moreover, we report the first baseline result on
the recently-released, large-scale dataset COCO-Text.Comment: 10 pages, 9 figures, 5 table
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Common Representation Learning Using Step-based Correlation Multi-Modal CNN
Deep learning techniques have been successfully used in learning a common
representation for multi-view data, wherein the different modalities are
projected onto a common subspace. In a broader perspective, the techniques used
to investigate common representation learning falls under the categories of
canonical correlation-based approaches and autoencoder based approaches. In
this paper, we investigate the performance of deep autoencoder based methods on
multi-view data. We propose a novel step-based correlation multi-modal CNN
(CorrMCNN) which reconstructs one view of the data given the other while
increasing the interaction between the representations at each hidden layer or
every intermediate step. Finally, we evaluate the performance of the proposed
model on two benchmark datasets - MNIST and XRMB. Through extensive
experiments, we find that the proposed model achieves better performance than
the current state-of-the-art techniques on joint common representation learning
and transfer learning tasks.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017
- …