411 research outputs found
3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
Cameras are a crucial exteroceptive sensor for self-driving cars as they are
low-cost and small, provide appearance information about the environment, and
work in various weather conditions. They can be used for multiple purposes such
as visual navigation and obstacle detection. We can use a surround multi-camera
system to cover the full 360-degree field-of-view around the car. In this way,
we avoid blind spots which can otherwise lead to accidents. To minimize the
number of cameras needed for surround perception, we utilize fisheye cameras.
Consequently, standard vision pipelines for 3D mapping, visual localization,
obstacle detection, etc. need to be adapted to take full advantage of the
availability of multiple cameras rather than treat each camera individually. In
addition, processing of fisheye images has to be supported. In this paper, we
describe the camera calibration and subsequent processing pipeline for
multi-fisheye-camera systems developed as part of the V-Charge project. This
project seeks to enable automated valet parking for self-driving cars. Our
pipeline is able to precisely calibrate multi-camera systems, build sparse 3D
maps for visual navigation, visually localize the car with respect to these
maps, generate accurate dense maps, as well as detect obstacles based on
real-time depth map extraction
Sensor node localisation using a stereo camera rig
In this paper, we use stereo vision processing techniques to
detect and localise sensors used for monitoring simulated
environmental events within an experimental sensor network testbed. Our sensor nodes communicate to the camera through patterns emitted by light emitting diodes (LEDs). Ultimately, we envisage the use of very low-cost, low-power,
compact microcontroller-based sensing nodes that employ
LED communication rather than power hungry RF to transmit data that is gathered via existing CCTV infrastructure.
To facilitate our research, we have constructed a controlled
environment where nodes and cameras can be deployed and
potentially hazardous chemical or physical plumes can be
introduced to simulate environmental pollution events in a
controlled manner. In this paper we show how 3D spatial
localisation of sensors becomes a straightforward task when
a stereo camera rig is used rather than a more usual 2D
CCTV camera
Infrastructure-based Multi-Camera Calibration using Radial Projections
Multi-camera systems are an important sensor platform for intelligent systems
such as self-driving cars. Pattern-based calibration techniques can be used to
calibrate the intrinsics of the cameras individually. However, extrinsic
calibration of systems with little to no visual overlap between the cameras is
a challenge. Given the camera intrinsics, infrastucture-based calibration
techniques are able to estimate the extrinsics using 3D maps pre-built via SLAM
or Structure-from-Motion. In this paper, we propose to fully calibrate a
multi-camera system from scratch using an infrastructure-based approach.
Assuming that the distortion is mainly radial, we introduce a two-stage
approach. We first estimate the camera-rig extrinsics up to a single unknown
translation component per camera. Next, we solve for both the intrinsic
parameters and the missing translation components. Extensive experiments on
multiple indoor and outdoor scenes with multiple multi-camera systems show that
our calibration method achieves high accuracy and robustness. In particular,
our approach is more robust than the naive approach of first estimating
intrinsic parameters and pose per camera before refining the extrinsic
parameters of the system. The implementation is available at
https://github.com/youkely/InfrasCal.Comment: ECCV 202
Efficient 2D-3D Matching for Multi-Camera Visual Localization
Visual localization, i.e., determining the position and orientation of a
vehicle with respect to a map, is a key problem in autonomous driving. We
present a multicamera visual inertial localization algorithm for large scale
environments. To efficiently and effectively match features against a pre-built
global 3D map, we propose a prioritized feature matching scheme for
multi-camera systems. In contrast to existing works, designed for monocular
cameras, we (1) tailor the prioritization function to the multi-camera setup
and (2) run feature matching and pose estimation in parallel. This
significantly accelerates the matching and pose estimation stages and allows us
to dynamically adapt the matching efforts based on the surrounding environment.
In addition, we show how pose priors can be integrated into the localization
system to increase efficiency and robustness. Finally, we extend our algorithm
by fusing the absolute pose estimates with motion estimates from a multi-camera
visual inertial odometry pipeline (VIO). This results in a system that provides
reliable and drift-less pose estimation. Extensive experiments show that our
localization runs fast and robust under varying conditions, and that our
extended algorithm enables reliable real-time pose estimation.Comment: 7 pages, 5 figure
Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition
When a human drives a car along a road for the first time, they later
recognize where they are on the return journey typically without needing to
look in their rear-view mirror or turn around to look back, despite significant
viewpoint and appearance change. Such navigation capabilities are typically
attributed to our semantic visual understanding of the environment [1] beyond
geometry to recognizing the types of places we are passing through such as
"passing a shop on the left" or "moving through a forested area". Humans are in
effect using place categorization [2] to perform specific place recognition
even when the viewpoint is 180 degrees reversed. Recent advances in deep neural
networks have enabled high-performance semantic understanding of visual places
and scenes, opening up the possibility of emulating what humans do. In this
work, we develop a novel methodology for using the semantics-aware higher-order
layers of deep neural networks for recognizing specific places from within a
reference database. To further improve the robustness to appearance change, we
develop a descriptor normalization scheme that builds on the success of
normalization schemes for pure appearance-based techniques such as SeqSLAM [3].
Using two different datasets - one road-based, one pedestrian-based, we
evaluate the performance of the system in performing place recognition on
reverse traversals of a route with a limited field of view camera and no
turn-back-and-look behaviours, and compare to existing state-of-the-art
techniques and vanilla off-the-shelf features. The results demonstrate
significant improvements over the existing state of the art, especially for
extreme perceptual challenges that involve both great viewpoint change and
environmental appearance change. We also provide experimental analyses of the
contributions of the various system components.Comment: 9 pages, 11 figures, ICRA 201
Constrained Bundle Adjustment for Structure From Motion Using Uncalibrated Multi-Camera Systems
Structure from motion using uncalibrated multi-camera systems is a
challenging task. This paper proposes a bundle adjustment solution that
implements a baseline constraint respecting that these cameras are static to
each other. We assume these cameras are mounted on a mobile platform,
uncalibrated, and coarsely synchronized. To this end, we propose the baseline
constraint that is formulated for the scenario in which the cameras have
overlapping views. The constraint is incorporated in the bundle adjustment
solution to keep the relative motion of different cameras static. Experiments
were conducted using video frames of two collocated GoPro cameras mounted on a
vehicle with no system calibration. These two cameras were placed capturing
overlapping contents. We performed our bundle adjustment using the proposed
constraint and then produced 3D dense point clouds. Evaluations were performed
by comparing these dense point clouds against LiDAR reference data. We showed
that, as compared to traditional bundle adjustment, our proposed method
achieved an improvement of 29.38%.Comment: to be published in ISPRS Congress 202
Principled bundle block adjustment with multi-head cameras
This paper examines the effects of implementing relative orientation constraints on bundle adjustment, as well as provides a full derivation of the Jacobian matrix for such an adjustment, that can be used to facilitate other implementations of bundle adjustment with constrained cameras. We present empirical evidence demonstrating improved accuracy and reduced computational load when these constraints are imposed
Sensor fusion for flexible human-portable building-scale mapping
This paper describes a system enabling rapid multi-floor indoor map building using a body-worn sensor system fusing information from RGB-D cameras, LIDAR, inertial, and barometric sensors. Our work is motivated by rapid response missions by emergency personnel, in which the capability for one or more people to rapidly map a complex indoor environment is essential for public safety. Human-portable mapping raises a number of challenges not encountered in typical robotic mapping applications including complex 6-DOF motion and the traversal of challenging trajectories including stairs or elevators. Our system achieves robust performance in these situations by exploiting state-of-the-art techniques for robust pose graph optimization and loop closure detection. It achieves real-time performance in indoor environments of moderate scale. Experimental results are demonstrated for human-portable mapping of several floors of a university building, demonstrating the system's ability to handle motion up and down stairs and to organize initially disconnected sets of submaps in a complex environment.Lincoln LaboratoryUnited States. Air Force (Contract FA8721-05-C-0002)United States. Office of Naval Research (Grant N00014-10-1-0936)United States. Office of Naval Research (Grant N00014-11-1-0688)United States. Office of Naval Research (Grant N00014-12-10020
Advances in Simultaneous Localization and Mapping in Confined Underwater Environments Using Sonar and Optical Imaging.
This thesis reports on the incorporation of surface information into a probabilistic simultaneous localization and mapping (SLAM) framework used on an autonomous underwater vehicle (AUV) designed for underwater inspection. AUVs operating in cluttered underwater environments, such as ship hulls or dams, are commonly equipped with Doppler-based sensors, which---in addition to navigation---provide a sparse representation of the environment in the form of a three-dimensional (3D) point cloud. The goal of this thesis is to develop perceptual algorithms that take full advantage of these sparse observations for correcting navigational drift and building a model of the environment. In particular, we focus on three objectives. First, we introduce a novel representation of this 3D point cloud as collections of planar features arranged in a factor graph. This factor graph representation probabalistically infers the spatial arrangement of each planar segment and can effectively model smooth surfaces (such as a ship hull). Second, we show how this technique can produce 3D models that serve as input to our pipeline that produces the first-ever 3D photomosaics using a two-dimensional (2D) imaging sonar. Finally, we propose a model-assisted bundle adjustment (BA) framework that allows for robust registration between surfaces observed from a Doppler sensor and visual features detected from optical images. Throughout this thesis, we show methods that produce 3D photomosaics using a combination of triangular meshes (derived from our SLAM framework or given a-priori), optical images, and sonar images. Overall, the contributions of this thesis greatly increase the accuracy, reliability, and utility of in-water ship hull inspection with AUVs despite the challenges they face in underwater environments.
We provide results using the Hovering Autonomous Underwater Vehicle (HAUV) for autonomous ship hull inspection, which serves as the primary testbed for the algorithms presented in this thesis. The sensor payload of the HAUV consists primarily of: a Doppler velocity log (DVL) for underwater navigation and ranging, monocular and stereo cameras, and---for some applications---an imaging sonar.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120750/1/paulozog_1.pd
Self-Calibration of Multi-Camera Systems for Vehicle Surround Sensing
Multi-camera systems are being deployed in a variety of vehicles and mobile robots today. To eliminate the need for cost and labor intensive maintenance and calibration, continuous self-calibration is highly desirable. In this book we present such an approach for self-calibration of multi-Camera systems for vehicle surround sensing. In an extensive evaluation we assess our algorithm quantitatively using real-world data
- …