148 research outputs found
Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization
Camera relocalization plays a vital role in many robotics and computer vision
tasks, such as global localization, recovery from tracking failure and loop
closure detection. Recent random forests based methods exploit randomly sampled
pixel comparison features to predict 3D world locations for 2D image locations
to guide the camera pose optimization. However, these image features are only
sampled randomly in the images, without considering the spatial structures or
geometric information, leading to large errors or failure cases with the
existence of poorly textured areas or in motion blur. Line segment features are
more robust in these environments. In this work, we propose to jointly exploit
points and lines within the framework of uncertainty driven regression forests.
The proposed approach is thoroughly evaluated on three publicly available
datasets against several strong state-of-the-art baselines in terms of several
different error metrics. Experimental results prove the efficacy of our method,
showing superior or on-par state-of-the-art performance.Comment: published as a conference paper at 2018 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS
Robust Visual Self-localization and Navigation in Outdoor Environments Using Slow Feature Analysis
Metka B. Robust Visual Self-localization and Navigation in Outdoor Environments Using Slow Feature Analysis. Bielefeld: Universität Bielefeld; 2019.Self-localization and navigation in outdoor environments are fundamental problems a
mobile robot has to solve in order to autonomously execute tasks in a spatial environ-
ment. Techniques based on the Global Positioning System (GPS) or laser-range finders
have been well established but suffer from the drawbacks of limited satellite availability
or high hardware effort and costs. Vision-based methods can provide an interesting al-
ternative, but are still a field of active research due to the challenges of visual perception
such as illumination and weather changes or long-term seasonal effects.
This thesis approaches the problem of robust visual self-localization and navigation using
a biologically motivated model based on unsupervised Slow Feature Analysis (SFA). It
is inspired by the discovery of neurons in a rat’s brain that form a neural representation
of the animal’s spatial attributes. A similar hierarchical SFA network has been shown
to learn representations of either the position or the orientation directly from the visual
input of a virtual rat depending on the movement statistics during training.
An extension to the hierarchical SFA network is introduced that allows to learn an
orientation invariant representation of the position by manipulating the perceived im-
age statistics exploiting the properties of panoramic vision. The model is applied on
a mobile robot in real world open field experiments obtaining localization accuracies
comparable to state-of-the-art approaches. The self-localization performance can be fur-
ther improved by incorporating wheel odometry into the purely vision based approach.
To achieve this, a method for the unsupervised learning of a mapping from slow fea-
ture to metric space is developed. Robustness w.r.t. short- and long-term appearance
changes is tackled by re-structuring the temporal order of the training image sequence
based on the identification of crossings in the training trajectory. Re-inserting images of
the same place in different conditions into the training sequence increases the temporal
variation of environmental effects and thereby improves invariance due to the slowness
objective of SFA. Finally, a straightforward method for navigation in slow feature space
is presented. Navigation can be performed efficiently by following the SFA-gradient,
approximated from distance measurements between the slow feature values at the target
and the current location. It is shown that the properties of the learned representations
enable complex navigation behaviors without explicit trajectory planning
Semantic Localization and Mapping in Robot Vision
Integration of human semantics plays an increasing role in robotics tasks such as mapping, localization and detection. Increased use of semantics serves multiple purposes, including giving computers the ability to process and present data containing human meaningful concepts, allowing computers to employ human reasoning to accomplish tasks.
This dissertation presents three solutions which incorporate semantics onto visual data in order to address these problems. First, on the problem of constructing topological maps from sequence of images. The proposed solution includes a novel image similarity score which uses dynamic programming to match images using both appearance and relative positions of local features simultaneously. An MRF is constructed to model the probability of loop-closures and a locally optimal labeling is found using Loopy-BP. The recovered loop closures are then used to generate a topological map. Results are presented on four urban sequences and one indoor sequence.
The second system uses video and annotated maps to solve localization. Data association is achieved through detection of object classes, annotated in prior maps, rather than through detection of visual features. To avoid the caveats of object recognition, a new representation of query images is introduced consisting of a vector of detection scores for each object class. Using soft object detections, hypotheses about pose are refined through particle filtering. Experiments include both small office spaces, and a large open urban rail station with semantically ambiguous places. This approach showcases a representation that is both robust and can exploit the plethora of existing prior maps for GPS-denied environments while avoiding the data association problems encountered when matching point clouds or visual features.
Finally, a purely vision-based approach for constructing semantic maps given camera pose and simple object exemplar images. Object response heatmaps are combined with known pose to back-project detection information onto the world. These update the world model, integrating information over time as the camera moves. The approach avoids making hard decisions on object recognition, and aggregates evidence about objects in the world coordinate system.
These solutions simultaneously showcase the contribution of semantics in robotics and provide state of the art solutions to these fundamental problems
Enhancing 3D Visual Odometry with Single-Camera Stereo Omnidirectional Systems
We explore low-cost solutions for efficiently improving the 3D pose estimation problem of a single camera moving in an unfamiliar environment. The visual odometry (VO) task -- as it is called when using computer vision to estimate egomotion -- is of particular interest to mobile robots as well as humans with visual impairments. The payload capacity of small robots like micro-aerial vehicles (drones) requires the use of portable perception equipment, which is constrained by size, weight, energy consumption, and processing power. Using a single camera as the passive sensor for the VO task satisfies these requirements, and it motivates the proposed solutions presented in this thesis.
To deliver the portability goal with a single off-the-shelf camera, we have taken two approaches: The first one, and the most extensively studied here, revolves around an unorthodox camera-mirrors configuration (catadioptrics) achieving a stereo omnidirectional system (SOS). The second approach relies on expanding the visual features from the scene into higher dimensionalities to track the pose of a conventional camera in a photogrammetric fashion. The first goal has many interdependent challenges, which we address as part of this thesis: SOS design, projection model, adequate calibration procedure, and application to VO. We show several practical advantages for the single-camera SOS due to its complete 360-degree stereo views, that other conventional 3D sensors lack due to their limited field of view. Since our omnidirectional stereo (omnistereo) views are captured by a single camera, a truly instantaneous pair of panoramic images is possible for 3D perception tasks. Finally, we address the VO problem as a direct multichannel tracking approach, which increases the pose estimation accuracy of the baseline method (i.e., using only grayscale or color information) under the photometric error minimization as the heart of the “direct” tracking algorithm. Currently, this solution has been tested on standard monocular cameras, but it could also be applied to an SOS.
We believe the challenges that we attempted to solve have not been considered previously with the level of detail needed for successfully performing VO with a single camera as the ultimate goal in both real-life and simulated scenes
WSR: A WiFi Sensor for Collaborative Robotics
In this paper we derive a new capability for robots to measure relative
direction, or Angle-of-Arrival (AOA), to other robots operating in
non-line-of-sight and unmapped environments with occlusions, without requiring
external infrastructure. We do so by capturing all of the paths that a WiFi
signal traverses as it travels from a transmitting to a receiving robot, which
we term an AOA profile. The key intuition is to "emulate antenna arrays in the
air" as the robots move in 3D space, a method akin to Synthetic Aperture Radar
(SAR). The main contributions include development of i) a framework to
accommodate arbitrary 3D trajectories, as well as continuous mobility all
robots, while computing AOA profiles and ii) an accompanying analysis that
provides a lower bound on variance of AOA estimation as a function of robot
trajectory geometry based on the Cramer Rao Bound. This is a critical
distinction with previous work on SAR that restricts robot mobility to
prescribed motion patterns, does not generalize to 3D space, and/or requires
transmitting robots to be static during data acquisition periods. Our method
results in more accurate AOA profiles and thus better AOA estimation, and
formally characterizes this observation as the informativeness of the
trajectory; a computable quantity for which we derive a closed form. All
theoretical developments are substantiated by extensive simulation and hardware
experiments. We also show that our formulation can be used with an
off-the-shelf trajectory estimation sensor. Finally, we demonstrate the
performance of our system on a multi-robot dynamic rendezvous task.Comment: 28 pages, 25 figures, *co-primary author
Distributed scene reconstruction from multiple mobile platforms
Recent research on mobile robotics has produced new designs that provide
house-hold robots with omnidirectional motion. The image sensor embedded
in these devices motivates the application of 3D vision techniques on them
for navigation and mapping purposes. In addition to this, distributed cheapsensing
systems acting as unitary entity have recently been discovered as an
efficient alternative to expensive mobile equipment.
In this work we present an implementation of a visual reconstruction method,
structure from motion (SfM), on a low-budget, omnidirectional mobile platform,
and extend this method to distributed 3D scene reconstruction with
several instances of such a platform.
Our approach overcomes the challenges yielded by the plaform. The unprecedented
levels of noise produced by the image compression typical of
the platform is processed by our feature filtering methods, which ensure
suitable feature matching populations for epipolar geometry estimation by
means of a strict quality-based feature selection. The robust pose estimation
algorithms implemented, along with a novel feature tracking system,
enable our incremental SfM approach to novelly deal with ill-conditioned
inter-image configurations provoked by the omnidirectional motion. The
feature tracking system developed efficiently manages the feature scarcity
produced by noise and outputs quality feature tracks, which allow robust
3D mapping of a given scene even if - due to noise - their length is shorter
than what it is usually assumed for performing stable 3D reconstructions.
The distributed reconstruction from multiple instances of SfM is attained
by applying loop-closing techniques. Our multiple reconstruction system
merges individual 3D structures and resolves the global scale problem with
minimal overlaps, whereas in the literature 3D mapping is obtained by overlapping
stretches of sequences. The performance of this system is demonstrated
in the 2-session case.
The management of noise, the stability against ill-configurations and the
robustness of our SfM system is validated on a number of experiments and
compared with state-of-the-art approaches. Possible future research areas
are also discussed
- …