19,607 research outputs found
3D keypoint detectors and descriptors for 3D objects recognition with TOF camera
International audienceThe goal of this work is to evaluate 3D keypoints detectors and descriptors, which could be used for quasi real time 3D object recognition. The work presented has three main objectives: extracting descriptors from real depth images, obtaining an accurate degree of invariance and robustness to scale and viewpoints, and maintaining the computation time as low as possible. Using a 3D time-of-flight (ToF) depth camera, we record a sequence for several objects at 3 different distances and from 5 viewpoints. 3D salient points are then extracted using 2 different curvatures-based detectors. For each point, two local surface descriptors are computed by combining the shape index histogram and the normalized histogram of angles between the normal of reference feature point and the normals of its neighbours. A comparison of the two detectors and descriptors was conducted on 4 different objects. Experimentations show that both detectors and descriptors are rather invariant to variations of scale and viewpoint. We also find that the new 3D keypoints detector proposed by us is more stable than a previously proposed Shape Index based detector
Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition
When a human drives a car along a road for the first time, they later
recognize where they are on the return journey typically without needing to
look in their rear-view mirror or turn around to look back, despite significant
viewpoint and appearance change. Such navigation capabilities are typically
attributed to our semantic visual understanding of the environment [1] beyond
geometry to recognizing the types of places we are passing through such as
"passing a shop on the left" or "moving through a forested area". Humans are in
effect using place categorization [2] to perform specific place recognition
even when the viewpoint is 180 degrees reversed. Recent advances in deep neural
networks have enabled high-performance semantic understanding of visual places
and scenes, opening up the possibility of emulating what humans do. In this
work, we develop a novel methodology for using the semantics-aware higher-order
layers of deep neural networks for recognizing specific places from within a
reference database. To further improve the robustness to appearance change, we
develop a descriptor normalization scheme that builds on the success of
normalization schemes for pure appearance-based techniques such as SeqSLAM [3].
Using two different datasets - one road-based, one pedestrian-based, we
evaluate the performance of the system in performing place recognition on
reverse traversals of a route with a limited field of view camera and no
turn-back-and-look behaviours, and compare to existing state-of-the-art
techniques and vanilla off-the-shelf features. The results demonstrate
significant improvements over the existing state of the art, especially for
extreme perceptual challenges that involve both great viewpoint change and
environmental appearance change. We also provide experimental analyses of the
contributions of the various system components.Comment: 9 pages, 11 figures, ICRA 201
On the Design and Analysis of Multiple View Descriptors
We propose an extension of popular descriptors based on gradient orientation
histograms (HOG, computed in a single image) to multiple views. It hinges on
interpreting HOG as a conditional density in the space of sampled images, where
the effects of nuisance factors such as viewpoint and illumination are
marginalized. However, such marginalization is performed with respect to a very
coarse approximation of the underlying distribution. Our extension leverages on
the fact that multiple views of the same scene allow separating intrinsic from
nuisance variability, and thus afford better marginalization of the latter. The
result is a descriptor that has the same complexity of single-view HOG, and can
be compared in the same manner, but exploits multiple views to better trade off
insensitivity to nuisance variability with specificity to intrinsic
variability. We also introduce a novel multi-view wide-baseline matching
dataset, consisting of a mixture of real and synthetic objects with ground
truthed camera motion and dense three-dimensional geometry
Group Invariant Deep Representations for Image Instance Retrieval
Most image instance retrieval pipelines are based on comparison of vectors
known as global image descriptors between a query image and the database
images. Due to their success in large scale image classification,
representations extracted from Convolutional Neural Networks (CNN) are quickly
gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors
for image instance retrieval. While CNN-based descriptors are generally
remarked for good retrieval performance at lower bitrates, they nevertheless
present a number of drawbacks including the lack of robustness to common object
transformations such as rotations compared with their interest point based FV
counterparts.
In this paper, we propose a method for computing invariant global descriptors
from CNNs. Our method implements a recently proposed mathematical theory for
invariance in a sensory cortex modeled as a feedforward neural network. The
resulting global descriptors can be made invariant to multiple arbitrary
transformation groups while retaining good discriminativeness.
Based on a thorough empirical evaluation using several publicly available
datasets, we show that our method is able to significantly and consistently
improve retrieval results every time a new type of invariance is incorporated.
We also show that our method which has few parameters is not prone to
overfitting: improvements generalize well across datasets with different
properties with regard to invariances. Finally, we show that our descriptors
are able to compare favourably to other state-of-the-art compact descriptors in
similar bitranges, exceeding the highest retrieval results reported in the
literature on some datasets. A dedicated dimensionality reduction step
--quantization or hashing-- may be able to further improve the competitiveness
of the descriptors
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
Place recognition: An Overview of Vision Perspective
Place recognition is one of the most fundamental topics in computer vision
and robotics communities, where the task is to accurately and efficiently
recognize the location of a given query image. Despite years of wisdom
accumulated in this field, place recognition still remains an open problem due
to the various ways in which the appearance of real-world places may differ.
This paper presents an overview of the place recognition literature. Since
condition invariant and viewpoint invariant features are essential factors to
long-term robust visual place recognition system, We start with traditional
image description methodology developed in the past, which exploit techniques
from image retrieval field. Recently, the rapid advances of related fields such
as object detection and image classification have inspired a new technique to
improve visual place recognition system, i.e., convolutional neural networks
(CNNs). Thus we then introduce recent progress of visual place recognition
system based on CNNs to automatically learn better image representations for
places. Eventually, we close with discussions and future work of place
recognition.Comment: Applied Sciences (2018
- …