5 research outputs found
Scene Parsing using Multiple Modalities
Scene parsing is the task of assigning a semantic class
label to the elements of a scene. It has many applications in
autonomous systems when we need to understand the visual data
captured from our environment. Different sensing modalities, such
as RGB cameras, multi-spectral cameras and Lidar sensors, can be
beneļ¬cial when pursuing this goal. Scene analysis using
multiple modalities aims at leveraging complementary information
captured by multiple sensing modalities. When multiple modalities
are used together, the strength of each modality can combat the
weaknesses of other modalities. Therefore, working with multiple
modalities enables us to use powerful tools for scene analysis.
However, possible gains of using multiple modalities come with
new challenges such as dealing with misalignments between
different modalities. In this thesis, our aim is to take
advantage of multiple modalities to improve outdoor scene parsing
and address the associated challenges. We initially investigate
the potential of multi-spectral imaging for outdoor scene
analysis. Our approach is to combine the discriminative strength
of the multi-spectral signature in each pixel and the
corresponding nature of the surrounding texture. Many materials
appearing similar if viewed by a common RGB camera, will show
discriminating properties if viewed by a camera capturing a
greater number of separated wavelengths. When using imagery data
for scene parsing, a number of challenges stem from, e.g., color
saturation, shadow and occlusion. To address such challenges, we
focus on scene parsing using multiple modalities, panoramic RGB
images and 3D Lidar data in particular, and propose a multi-view
approach to select the best 2D view that describes each element
in the 3D point cloud data. Keeping our focus on using multiple
modalities, we then introduce a multi-modal graphical model to
address the problems of scene parsing using 2D3D data exhibiting
extensive many-to-one correspondences. Existing methods often
impose a hard correspondence between the 2D and 3D data, where
the 2D and 3D corresponding regions are forced to receive
identical labels. This results in performance degradation due to
misalignments, 3D-2D projection errors and occlusions. We address
this issue by deļ¬ning a graph over the entire set of data that
models soft correspondences between the two modalities. This
graph encourages each region in a modality to leverage the
information from its corresponding regions in the other modality
to better estimate its class label. Finally, we introduce latent
nodes to explicitly model inconsistencies between the modalities.
The latent nodes allow us not only to leverage information from
various domains in order to improve the labeling of the
modalities, but also to cut the edges between inconsistent
regions. To eliminate the need for hand tuning the parameters of
our model, we propose to learn potential functions from training
data. In addition, to demonstrate the beneļ¬ts of the proposed
approaches on publicly available multi-modality datasets, we
introduce a new multi-modal dataset of panoramic images and 3D
point cloud data captured from outdoor scenes (NICTA/2D3D
Dataset)
Multi-view terrain classification using panoramic imagery and LIDAR
The focus of this work is addressing the challenges of performing object recognition in real world scenes as captured by a commercial, state-of-the-art, surveying vehicle equipped with a 360Ā° panoramic camera in conjunction with a 3D laser scanner (LIDA
Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles
Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE