92,013 research outputs found
Deep Learning Features at Scale for Visual Place Recognition
The success of deep learning techniques in the computer vision domain has
triggered a range of initial investigations into their utility for visual place
recognition, all using generic features from networks that were trained for
other types of recognition tasks. In this paper, we train, at large scale, two
CNN architectures for the specific place recognition task and employ a
multi-scale feature encoding method to generate condition- and
viewpoint-invariant features. To enable this training to occur, we have
developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of
place appearance change at thousands of different places, as opposed to the
semantic place type datasets currently available. This new dataset enables us
to set up a training regime that interprets place recognition as a
classification problem. We comprehensively evaluate our trained networks on
several challenging benchmark place recognition datasets and demonstrate that
they achieve an average 10% increase in performance over other place
recognition algorithms and pre-trained CNNs. By analyzing the network responses
and their differences from pre-trained networks, we provide insights into what
a network learns when training for place recognition, and what these results
signify for future research in this area.Comment: 8 pages, 10 figures. Accepted by International Conference on Robotics
and Automation (ICRA) 2017. This is the submitted version. The final
published version may be slightly differen
Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition
When a human drives a car along a road for the first time, they later
recognize where they are on the return journey typically without needing to
look in their rear-view mirror or turn around to look back, despite significant
viewpoint and appearance change. Such navigation capabilities are typically
attributed to our semantic visual understanding of the environment [1] beyond
geometry to recognizing the types of places we are passing through such as
"passing a shop on the left" or "moving through a forested area". Humans are in
effect using place categorization [2] to perform specific place recognition
even when the viewpoint is 180 degrees reversed. Recent advances in deep neural
networks have enabled high-performance semantic understanding of visual places
and scenes, opening up the possibility of emulating what humans do. In this
work, we develop a novel methodology for using the semantics-aware higher-order
layers of deep neural networks for recognizing specific places from within a
reference database. To further improve the robustness to appearance change, we
develop a descriptor normalization scheme that builds on the success of
normalization schemes for pure appearance-based techniques such as SeqSLAM [3].
Using two different datasets - one road-based, one pedestrian-based, we
evaluate the performance of the system in performing place recognition on
reverse traversals of a route with a limited field of view camera and no
turn-back-and-look behaviours, and compare to existing state-of-the-art
techniques and vanilla off-the-shelf features. The results demonstrate
significant improvements over the existing state of the art, especially for
extreme perceptual challenges that involve both great viewpoint change and
environmental appearance change. We also provide experimental analyses of the
contributions of the various system components.Comment: 9 pages, 11 figures, ICRA 201
Exploring Convolutional Networks for End-to-End Visual Servoing
Present image based visual servoing approaches rely on extracting hand
crafted visual features from an image. Choosing the right set of features is
important as it directly affects the performance of any approach. Motivated by
recent breakthroughs in performance of data driven methods on recognition and
localization tasks, we aim to learn visual feature representations suitable for
servoing tasks in unstructured and unknown environments. In this paper, we
present an end-to-end learning based approach for visual servoing in diverse
scenes where the knowledge of camera parameters and scene geometry is not
available a priori. This is achieved by training a convolutional neural network
over color images with synchronised camera poses. Through experiments performed
in simulation and on a quadrotor, we demonstrate the efficacy and robustness of
our approach for a wide range of camera poses in both indoor as well as outdoor
environments.Comment: IEEE ICRA 201
Recommended from our members
Prototyping a Context-Aware Framework for Pervasive Entertainment Applications
Mixed reality participants in smart meeting rooms and smart home enviroments
Human–computer interaction requires modeling of the user. A user profile typically contains preferences, interests, characteristics, and interaction behavior. However, in its multimodal interaction with a smart environment the user displays characteristics that show how the user, not necessarily consciously, verbally and nonverbally provides the smart environment with useful input and feedback. Especially in ambient intelligence environments we encounter situations where the environment supports interaction between the environment, smart objects (e.g., mobile robots, smart furniture) and human participants in the environment. Therefore it is useful for the profile to contain a physical representation of the user obtained by multi-modal capturing techniques. We discuss the modeling and simulation of interacting participants in a virtual meeting room, we discuss how remote meeting participants can take part in meeting activities and they have some observations on translating research results to smart home environments
- …