597 research outputs found
Fusion of aerial images and sensor data from a ground vehicle for improved semantic mapping
This work investigates the use of semantic information to link ground level occupancy maps and aerial images. A ground level semantic map, which shows open ground and indicates the probability of cells being occupied by walls of buildings, is obtained by a mobile robot equipped with an omnidirectional camera, GPS and a laser range finder. This semantic information is used for local and global segmentation of an aerial image. The result is a map where the semantic information has been extended beyond the range of the robot sensors and predicts where the mobile robot can find buildings and potentially driveable ground
Semantic evidential grid mapping using monocular and stereo cameras
Accurately estimating the current state of local traffic scenes is one of the key problems in the development of software components for automated vehicles. In addition to details on free space and drivability, static and dynamic traffic participants and information on the semantics may also be included in the desired representation. Multi-layer grid maps allow the inclusion of all of this information in a common representation. However, most existing grid mapping approaches only process range sensor measurements such as Lidar and Radar and solely model occupancy without semantic states. In order to add sensor redundancy and diversity, it is desired to add vision-based sensor setups in a common grid map representation. In this work, we present a semantic evidential grid mapping pipeline, including estimates for eight semantic classes, that is designed for straightforward fusion with range sensor data. Unlike other publications, our representation explicitly models uncertainties in the evidential model. We present results of our grid mapping pipeline based on a monocular vision setup and a stereo vision setup. Our mapping results are accurate and dense mapping due to the incorporation of a disparity- or depth-based ground surface estimation in the inverse perspective mapping. We conclude this paper by providing a detailed quantitative evaluation based on real traffic scenarios in the KITTI odometry benchmark dataset and demonstrating the advantages compared to other semantic grid mapping approaches
Multi-sensor based object detection in driving scenes
The work done in this internship consists in two main part. The first part is the design of an experimental platform to acquire data for testing and training. To design the experiments, onboard and onroad sensors have been considered. A calibration process has been conducted in order to integrated all the data from different sources. The second part was the use of a stereo system and a laser scanner to extract the free navigable space and to detect obstacles. This has been conducted through the use of an occupancy grid map representation
Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots
Autonomous robots that assist humans in day to day living tasks are becoming
increasingly popular. Autonomous mobile robots operate by sensing and
perceiving their surrounding environment to make accurate driving decisions. A
combination of several different sensors such as LiDAR, radar, ultrasound
sensors and cameras are utilized to sense the surrounding environment of
autonomous vehicles. These heterogeneous sensors simultaneously capture various
physical attributes of the environment. Such multimodality and redundancy of
sensing need to be positively utilized for reliable and consistent perception
of the environment through sensor data fusion. However, these multimodal sensor
data streams are different from each other in many ways, such as temporal and
spatial resolution, data format, and geometric alignment. For the subsequent
perception algorithms to utilize the diversity offered by multimodal sensing,
the data streams need to be spatially, geometrically and temporally aligned
with each other. In this paper, we address the problem of fusing the outputs of
a Light Detection and Ranging (LiDAR) scanner and a wide-angle monocular image
sensor for free space detection. The outputs of LiDAR scanner and the image
sensor are of different spatial resolutions and need to be aligned with each
other. A geometrical model is used to spatially align the two sensor outputs,
followed by a Gaussian Process (GP) regression-based resolution matching
algorithm to interpolate the missing data with quantifiable uncertainty. The
results indicate that the proposed sensor data fusion framework significantly
aids the subsequent perception steps, as illustrated by the performance
improvement of a uncertainty aware free space detection algorith
Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles
Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization
Monocular 3D object localization in driving scenes is a crucial task, but
challenging due to its ill-posed nature. Estimating 3D coordinates for each
pixel on the object surface holds great potential as it provides dense 2D-3D
geometric constraints for the underlying PnP problem. However, high-quality
ground truth supervision is not available in driving scenes due to sparsity and
various artifacts of Lidar data, as well as the practical infeasibility of
collecting per-instance CAD models. In this work, we present NeurOCS, a
framework that uses instance masks and 3D boxes as input to learn 3D object
shapes by means of differentiable rendering, which further serves as
supervision for learning dense object coordinates. Our approach rests on
insights in learning a category-level shape prior directly from real driving
scenes, while properly handling single-view ambiguities. Furthermore, we study
and make critical design choices to learn object coordinates more effectively
from an object-centric view. Altogether, our framework leads to new
state-of-the-art in monocular 3D localization that ranks 1st on the
KITTI-Object benchmark among published monocular methods.Comment: Paper was accepted to CVPR 202
3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
Cameras are a crucial exteroceptive sensor for self-driving cars as they are
low-cost and small, provide appearance information about the environment, and
work in various weather conditions. They can be used for multiple purposes such
as visual navigation and obstacle detection. We can use a surround multi-camera
system to cover the full 360-degree field-of-view around the car. In this way,
we avoid blind spots which can otherwise lead to accidents. To minimize the
number of cameras needed for surround perception, we utilize fisheye cameras.
Consequently, standard vision pipelines for 3D mapping, visual localization,
obstacle detection, etc. need to be adapted to take full advantage of the
availability of multiple cameras rather than treat each camera individually. In
addition, processing of fisheye images has to be supported. In this paper, we
describe the camera calibration and subsequent processing pipeline for
multi-fisheye-camera systems developed as part of the V-Charge project. This
project seeks to enable automated valet parking for self-driving cars. Our
pipeline is able to precisely calibrate multi-camera systems, build sparse 3D
maps for visual navigation, visually localize the car with respect to these
maps, generate accurate dense maps, as well as detect obstacles based on
real-time depth map extraction
- âŠ