589 research outputs found
3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
Cameras are a crucial exteroceptive sensor for self-driving cars as they are
low-cost and small, provide appearance information about the environment, and
work in various weather conditions. They can be used for multiple purposes such
as visual navigation and obstacle detection. We can use a surround multi-camera
system to cover the full 360-degree field-of-view around the car. In this way,
we avoid blind spots which can otherwise lead to accidents. To minimize the
number of cameras needed for surround perception, we utilize fisheye cameras.
Consequently, standard vision pipelines for 3D mapping, visual localization,
obstacle detection, etc. need to be adapted to take full advantage of the
availability of multiple cameras rather than treat each camera individually. In
addition, processing of fisheye images has to be supported. In this paper, we
describe the camera calibration and subsequent processing pipeline for
multi-fisheye-camera systems developed as part of the V-Charge project. This
project seeks to enable automated valet parking for self-driving cars. Our
pipeline is able to precisely calibrate multi-camera systems, build sparse 3D
maps for visual navigation, visually localize the car with respect to these
maps, generate accurate dense maps, as well as detect obstacles based on
real-time depth map extraction
Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments
In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method
Towards Robust Visual Localization in Challenging Conditions
Visual localization is a fundamental problem in computer vision, with a multitude of applications in robotics, augmented reality and structure-from-motion. The basic problem is to, based on one or more images, figure out the position and orientation of the camera which captured these images relative to some model of the environment. Current visual localization approaches typically work well when the images to be localized are captured under similar conditions compared to those captured during mapping. However, when the environment exhibits large changes in visual appearance, due to e.g. variations in weather, seasons, day-night or viewpoint, the traditional pipelines break down. The reason is that the local image features used are based on low-level pixel-intensity information, which is not invariant to these transformations: when the environment changes, this will cause a different set of keypoints to be detected, and their descriptors will be different, making the long-term visual localization problem a challenging one. In this thesis, five papers are included, which present work towards solving the problem of long-term visual localization. Two of the articles present ideas for how semantic information may be included to aid in the localization process: one approach relies only on the semantic information for visual localization, and the other shows how the semantics can be used to detect outlier feature correspondences. The third paper considers how the output from a monocular depth-estimation network can be utilized to extract features that are less sensitive to viewpoint changes. The fourth article is a benchmark paper, where we present three new benchmark datasets aimed at evaluating localization algorithms in the context of long-term visual localization. Lastly, the fifth article considers how to perform convolutions on spherical imagery, which in the future might be applied to learning local image features for the localization problem
A multisensor SLAM for dense maps of large scale environments under poor lighting conditions
This thesis describes the development and implementation of a multisensor large scale autonomous mapping system for surveying tasks in underground mines. The hazardous nature of the underground mining industry has resulted in a push towards autonomous solutions to the most dangerous operations, including surveying tasks. Many existing autonomous mapping techniques rely on approaches to the Simultaneous Localization and Mapping (SLAM) problem which are not suited to the extreme characteristics of active underground mining environments. Our proposed multisensor system has been designed from the outset to address the unique challenges associated with underground SLAM. The robustness, self-containment and portability of the system maximize the potential applications.The multisensor mapping solution proposed as a result of this work is based on a fusion of omnidirectional bearing-only vision-based localization and 3D laser point cloud registration. By combining these two SLAM techniques it is possible to achieve some of the advantages of both approaches – the real-time attributes of vision-based SLAM and the dense, high precision maps obtained through 3D lasers. The result is a viable autonomous mapping solution suitable for application in challenging underground mining environments.A further improvement to the robustness of the proposed multisensor SLAM system is a consequence of incorporating colour information into vision-based localization. Underground mining environments are often dominated by dynamic sources of illumination which can cause inconsistent feature motion during localization. Colour information is utilized to identify and remove features resulting from illumination artefacts and to improve the monochrome based feature matching between frames.Finally, the proposed multisensor mapping system is implemented and evaluated in both above ground and underground scenarios. The resulting large scale maps contained a maximum offset error of ±30mm for mapping tasks with lengths over 100m
Automated Map Reading: Image Based Localisation in 2-D Maps Using Binary Semantic Descriptors
We describe a novel approach to image based localisation in urban
environments using semantic matching between images and a 2-D map. It contrasts
with the vast majority of existing approaches which use image to image database
matching. We use highly compact binary descriptors to represent semantic
features at locations, significantly increasing scalability compared with
existing methods and having the potential for greater invariance to variable
imaging conditions. The approach is also more akin to human map reading, making
it more suited to human-system interaction. The binary descriptors indicate the
presence or not of semantic features relating to buildings and road junctions
in discrete viewing directions. We use CNN classifiers to detect the features
in images and match descriptor estimates with a database of location tagged
descriptors derived from the 2-D map. In isolation, the descriptors are not
sufficiently discriminative, but when concatenated sequentially along a route,
their combination becomes highly distinctive and allows localisation even when
using non-perfect classifiers. Performance is further improved by taking into
account left or right turns over a route. Experimental results obtained using
Google StreetView and OpenStreetMap data show that the approach has
considerable potential, achieving localisation accuracy of around 85% using
routes corresponding to approximately 200 meters.Comment: 8 pages, submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems 201
Real-time Visual Flow Algorithms for Robotic Applications
Vision offers important sensor cues to modern robotic platforms.
Applications such as control of aerial vehicles, visual servoing,
simultaneous localization and mapping, navigation and more
recently, learning, are examples where visual information is
fundamental to accomplish tasks. However, the use of computer
vision algorithms carries the computational cost of extracting
useful information from the stream of raw pixel data. The most
sophisticated algorithms use complex mathematical formulations
leading typically to computationally expensive, and consequently,
slow implementations. Even with modern computing resources,
high-speed and high-resolution video feed can only be used for
basic image processing operations. For a vision algorithm to be
integrated on a robotic system, the output of the algorithm
should be provided in real time, that is, at least at the same
frequency as the control logic of the robot. With robotic
vehicles becoming more dynamic and ubiquitous, this places higher
requirements to the vision processing pipeline.
This thesis addresses the problem of estimating dense visual flow
information in real time. The contributions of this work are
threefold. First, it introduces a new filtering algorithm for the
estimation of dense optical flow at frame rates as fast as 800 Hz
for 640x480 image resolution. The algorithm follows a
update-prediction architecture to estimate dense optical flow
fields incrementally over time. A fundamental component of the
algorithm is the modeling of the spatio-temporal evolution of the
optical flow field by means of partial differential equations.
Numerical predictors can implement such PDEs to propagate current
estimation of flow forward in time. Experimental validation of
the algorithm is provided using high-speed ground truth image
dataset as well as real-life video data at 300 Hz.
The second contribution is a new type of visual flow named
structure flow. Mathematically, structure flow is the
three-dimensional scene flow scaled by the inverse depth at each
pixel in the image. Intuitively, it is the complete velocity
field associated with image motion, including both optical flow
and scale-change or apparent divergence of the image. Analogously
to optic flow, structure flow provides a robotic vehicle with
perception of the motion of the environment as seen by the
camera. However, structure flow encodes the full 3D image motion
of the scene whereas optic flow only encodes the component on the
image plane. An algorithm to estimate structure flow from image
and depth measurements is proposed based on the same filtering
idea used to estimate optical flow.
The final contribution is the spherepix data structure for
processing spherical images. This data structure is the numerical
back-end used for the real-time implementation of the structure
flow filter. It consists of a set of overlapping patches covering
the surface of the sphere. Each individual patch approximately
holds properties such as orthogonality and equidistance of
points, thus allowing efficient implementations of low-level
classical 2D convolution based image processing routines such as
Gaussian filters and numerical derivatives.
These algorithms are implemented on GPU hardware and can be
integrated to future Robotic Embedded Vision systems to provide
fast visual information to robotic vehicles
Dataset of Panoramic Images for People Tracking in Service Robotics
We provide a framework for constructing a guided robot for usage in hospitals in this thesis. The omnidirectional camera on the robot allows it to recognize and track the person who is following it. Furthermore, when directing the individual to their preferred position in the hospital, the robot must be aware of its surroundings and avoid accidents with other people or items. To train and evaluate our robot's performance, we developed an auto-labeling framework for creating a dataset of panoramic videos captured by the robot's omnidirectional camera. We labeled each person in the video and their real position in the robot's frame, enabling us to evaluate the accuracy of our tracking system and guide the development of the robot's navigation algorithms. Our research expands on earlier work that has established a framework for tracking individuals using omnidirectional cameras. We want to contribute to the continuing work to enhance the precision and dependability of these tracking systems, which is essential for the creation of efficient guiding robots in healthcare facilities, by developing a benchmark dataset. Our research has the potential to improve the patient experience and increase the efficiency of healthcare institutions by reducing staff time spent guiding patients through the facility.We provide a framework for constructing a guided robot for usage in hospitals in this thesis. The omnidirectional camera on the robot allows it to recognize and track the person who is following it. Furthermore, when directing the individual to their preferred position in the hospital, the robot must be aware of its surroundings and avoid accidents with other people or items. To train and evaluate our robot's performance, we developed an auto-labeling framework for creating a dataset of panoramic videos captured by the robot's omnidirectional camera. We labeled each person in the video and their real position in the robot's frame, enabling us to evaluate the accuracy of our tracking system and guide the development of the robot's navigation algorithms. Our research expands on earlier work that has established a framework for tracking individuals using omnidirectional cameras. We want to contribute to the continuing work to enhance the precision and dependability of these tracking systems, which is essential for the creation of efficient guiding robots in healthcare facilities, by developing a benchmark dataset. Our research has the potential to improve the patient experience and increase the efficiency of healthcare institutions by reducing staff time spent guiding patients through the facility
Distributed scene reconstruction from multiple mobile platforms
Recent research on mobile robotics has produced new designs that provide
house-hold robots with omnidirectional motion. The image sensor embedded
in these devices motivates the application of 3D vision techniques on them
for navigation and mapping purposes. In addition to this, distributed cheapsensing
systems acting as unitary entity have recently been discovered as an
efficient alternative to expensive mobile equipment.
In this work we present an implementation of a visual reconstruction method,
structure from motion (SfM), on a low-budget, omnidirectional mobile platform,
and extend this method to distributed 3D scene reconstruction with
several instances of such a platform.
Our approach overcomes the challenges yielded by the plaform. The unprecedented
levels of noise produced by the image compression typical of
the platform is processed by our feature filtering methods, which ensure
suitable feature matching populations for epipolar geometry estimation by
means of a strict quality-based feature selection. The robust pose estimation
algorithms implemented, along with a novel feature tracking system,
enable our incremental SfM approach to novelly deal with ill-conditioned
inter-image configurations provoked by the omnidirectional motion. The
feature tracking system developed efficiently manages the feature scarcity
produced by noise and outputs quality feature tracks, which allow robust
3D mapping of a given scene even if - due to noise - their length is shorter
than what it is usually assumed for performing stable 3D reconstructions.
The distributed reconstruction from multiple instances of SfM is attained
by applying loop-closing techniques. Our multiple reconstruction system
merges individual 3D structures and resolves the global scale problem with
minimal overlaps, whereas in the literature 3D mapping is obtained by overlapping
stretches of sequences. The performance of this system is demonstrated
in the 2-session case.
The management of noise, the stability against ill-configurations and the
robustness of our SfM system is validated on a number of experiments and
compared with state-of-the-art approaches. Possible future research areas
are also discussed
- …