2,798 research outputs found
DASC: Robust Dense Descriptor for Multi-modal and Multi-spectral Correspondence Estimation
Establishing dense correspondences between multiple images is a fundamental
task in many applications. However, finding a reliable correspondence in
multi-modal or multi-spectral images still remains unsolved due to their
challenging photometric and geometric variations. In this paper, we propose a
novel dense descriptor, called dense adaptive self-correlation (DASC), to
estimate multi-modal and multi-spectral dense correspondences. Based on an
observation that self-similarity existing within images is robust to imaging
modality variations, we define the descriptor with a series of an adaptive
self-correlation similarity measure between patches sampled by a randomized
receptive field pooling, in which a sampling pattern is obtained using a
discriminative learning. The computational redundancy of dense descriptors is
dramatically reduced by applying fast edge-aware filtering. Furthermore, in
order to address geometric variations including scale and rotation, we propose
a geometry-invariant DASC (GI-DASC) descriptor that effectively leverages the
DASC through a superpixel-based representation. For a quantitative evaluation
of the GI-DASC, we build a novel multi-modal benchmark as varying photometric
and geometric conditions. Experimental results demonstrate the outstanding
performance of the DASC and GI-DASC in many cases of multi-modal and
multi-spectral dense correspondences
Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence
The conventional methods for estimating camera poses and scene structures
from severely blurry or low resolution images often result in failure. The
off-the-shelf deblurring or super-resolution methods may show visually pleasing
results. However, applying each technique independently before matching is
generally unprofitable because this naive series of procedures ignores the
consistency between images. In this paper, we propose a pioneering unified
framework that solves four problems simultaneously, namely, dense depth
reconstruction, camera pose estimation, super-resolution, and deblurring. By
reflecting a physical imaging process, we formulate a cost minimization problem
and solve it using an alternating optimization technique. The experimental
results on both synthetic and real videos show high-quality depth maps derived
from severely degraded images that contrast the failures of naive multi-view
stereo methods. Our proposed method also produces outstanding deblurred and
super-resolved images unlike the independent application or combination of
conventional video deblurring, super-resolution methods.Comment: accepted to ICCV 201
Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks
Obstacle Detection is a central problem for any robotic system, and critical
for autonomous systems that travel at high speeds in unpredictable environment.
This is often achieved through scene depth estimation, by various means. When
fast motion is considered, the detection range must be longer enough to allow
for safe avoidance and path planning. Current solutions often make assumption
on the motion of the vehicle that limit their applicability, or work at very
limited ranges due to intrinsic constraints. We propose a novel
appearance-based Object Detection system that is able to detect obstacles at
very long range and at a very high speed (~300Hz), without making assumptions
on the type of motion. We achieve these results using a Deep Neural Network
approach trained on real and synthetic images and trading some depth accuracy
for fast, robust and consistent operation. We show how photo-realistic
synthetic images are able to solve the problem of training set dimension and
variety typical of machine learning approaches, and how our system is robust to
massive blurring of test images.Comment: Accepted for publication in the Proceedings of the 2016 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS 2016
An analysis of the factors affecting keypoint stability in scale-space
The most popular image matching algorithm SIFT, introduced by D. Lowe a
decade ago, has proven to be sufficiently scale invariant to be used in
numerous applications. In practice, however, scale invariance may be weakened
by various sources of error inherent to the SIFT implementation affecting the
stability and accuracy of keypoint detection. The density of the sampling of
the Gaussian scale-space and the level of blur in the input image are two of
these sources. This article presents a numerical analysis of their impact on
the extracted keypoints stability. Such an analysis has both methodological and
practical implications, on how to compare feature detectors and on how to
improve SIFT. We show that even with a significantly oversampled scale-space
numerical errors prevent from achieving perfect stability. Usual strategies to
filter out unstable detections are shown to be inefficient. We also prove that
the effect of the error in the assumption on the initial blur is asymmetric and
that the method is strongly degraded in presence of aliasing or without a
correct assumption on the camera blur
Deep Self-Convolutional Activations Descriptor for Dense Cross-Modal Correspondence
We present a novel descriptor, called deep self-convolutional activations
(DeSCA), designed for establishing dense correspondences between images taken
under different imaging modalities, such as different spectral ranges or
lighting conditions. Motivated by descriptors based on local self-similarity
(LSS), we formulate a novel descriptor by leveraging LSS in a deep
architecture, leading to better discriminative power and greater robustness to
non-rigid image deformations than state-of-the-art cross-modality descriptors.
The DeSCA first computes self-convolutions over a local support window for
randomly sampled patches, and then builds self-convolution activations by
performing an average pooling through a hierarchical formulation within a deep
convolutional architecture. Finally, the feature responses on the
self-convolution activations are encoded through a spatial pyramid pooling in a
circular configuration. In contrast to existing convolutional neural networks
(CNNs) based descriptors, the DeSCA is training-free (i.e., randomly sampled
patches are utilized as the convolution kernels), is robust to cross-modal
imaging, and can be densely computed in an efficient manner that significantly
reduces computational redundancy. The state-of-the-art performance of DeSCA on
challenging cases of cross-modal image pairs is demonstrated through extensive
experiments
Object Detection Using Keygraphs
We propose a new framework for object detection based on a generalization of
the keypoint correspondence framework. This framework is based on replacing
keypoints by keygraphs, i.e. isomorph directed graphs whose vertices are
keypoints, in order to explore relative and structural information. Unlike
similar works in the literature, we deal directly with graphs in the entire
pipeline: we search for graph correspondences instead of searching for
individual point correspondences and then building graph correspondences from
them afterwards. We also estimate the pose from graph correspondences instead
of falling back to point correspondences through a voting table. The
contributions of this paper are the proposed framework and an implementation
that properly handles its inherent issues of loss of locality and combinatorial
explosion, showing its viability for real-time applications. In particular, we
introduce the novel concept of keytuples to solve a running time issue. The
accuracy of the implementation is shown by results of over 800 experiments with
a well-known database of images. The speed is illustrated by real-time tracking
with two different cameras in ordinary hardware
Evaluation of Three Vision Based Object Perception Methods for a Mobile Robot
This paper addresses object perception applied to mobile robotics. Being able
to perceive semantically meaningful objects in unstructured environments is a
key capability in order to make robots suitable to perform high-level tasks in
home environments. However, finding a solution for this task is daunting: it
requires the ability to handle the variability in image formation in a moving
camera with tight time constraints. The paper brings to attention some of the
issues with applying three state of the art object recognition and detection
methods in a mobile robotics scenario, and proposes methods to deal with
windowing/segmentation. Thus, this work aims at evaluating the state-of-the-art
in object perception in an attempt to develop a lightweight solution for mobile
robotics use/research in typical indoor settings.Comment: 37 pages, 11 figure
Depth-aware Blending of Smoothed Images for Bokeh Effect Generation
Bokeh effect is used in photography to capture images where the closer
objects look sharp and every-thing else stays out-of-focus. Bokeh photos are
generally captured using Single Lens Reflex cameras using shallow
depth-of-field. Most of the modern smartphones can take bokeh images by
leveraging dual rear cameras or a good auto-focus hardware. However, for
smartphones with single-rear camera without a good auto-focus hardware, we have
to rely on software to generate bokeh images. This kind of system is also
useful to generate bokeh effect in already captured images. In this paper, an
end-to-end deep learning framework is proposed to generate high-quality bokeh
effect from images. The original image and different versions of smoothed
images are blended to generate Bokeh effect with the help of a monocular depth
estimation network. The proposed approach is compared against a saliency
detection based baseline and a number of approaches proposed in AIM 2019
Challenge on Bokeh Effect Synthesis. Extensive experiments are shown in order
to understand different parts of the proposed algorithm. The network is
lightweight and can process an HD image in 0.03 seconds. This approach ranked
second in AIM 2019 Bokeh effect challenge-Perceptual Track
From handcrafted to deep local features
This paper presents an overview of the evolution of local features from
handcrafted to deep-learning-based methods, followed by a discussion of several
benchmarks and papers evaluating such local features. Our investigations are
motivated by 3D reconstruction problems, where the precise location of the
features is important. As we describe these methods, we highlight and explain
the challenges of feature extraction and potential ways to overcome them. We
first present handcrafted methods, followed by methods based on classical
machine learning and finally we discuss methods based on deep-learning. This
largely chronologically-ordered presentation will help the reader to fully
understand the topic of image and region description in order to make best use
of it in modern computer vision applications. In particular, understanding
handcrafted methods and their motivation can help to understand modern
approaches and how machine learning is used to improve the results. We also
provide references to most of the relevant literature and code.Comment: Preprin
PROBE: Predictive Robust Estimation for Visual-Inertial Navigation
Navigation in unknown, chaotic environments continues to present a
significant challenge for the robotics community. Lighting changes,
self-similar textures, motion blur, and moving objects are all considerable
stumbling blocks for state-of-the-art vision-based navigation algorithms. In
this paper we present a novel technique for improving localization accuracy
within a visual-inertial navigation system (VINS). We make use of training data
to learn a model for the quality of visual features with respect to
localization error in a given environment. This model maps each visual
observation from a predefined prediction space of visual-inertial predictors
onto a scalar weight, which is then used to scale the observation covariance
matrix. In this way, our model can adjust the influence of each observation
according to its quality. We discuss our choice of predictors and report
substantial reductions in localization error on 4 km of data from the KITTI
dataset, as well as on experimental datasets consisting of 700 m of indoor and
outdoor driving on a small ground rover equipped with a Skybotix VI-Sensor.Comment: In Proceedings of the IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS'15), Hamburg, Germany, Sep. 28-Oct. 2,
201
- …