11 research outputs found
Online Classification in 3D Urban Datasets Based on Hierarchical Detection
One of the most significant problems in the area of 3D range image processing is that of segmentation and classifi-cation from 3D laser range data, especially in real-time. In this work we introduce a novel multi-layer approach to the classification of 3D laser scan data. In particular, we build a hierarchical framework of online detection and identifica-tion procedures drawn from sequential analysis namely the CUSUM (Cumulative Sum) and SPRT (Sequential Proba-bility Ratio Test), both of which are low complexity algo-rithms. Each layer of algorithms builds upon the decisions made at the previous stage thus providing a robust frame-work of online decision making. In our new framework we are not only able to classify in coarse classes such as verti-cal, horizontal and/or vegetation but to also identify objects characterized by more subtle or gradual changes such as curbs or steps. Moreover, our new multi-layer approach combines information across scanlines and results in more accurate decision making. We perform experiments in com-plex urban scenes and provide quantitative results. 1
Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles
Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE
On the Role of Context at Different Scales in Scene Parsing
Scene parsing can be formulated as a labeling problem where each
visual data element, e.g., each pixel of an image or each 3D
point in a point cloud, is assigned a semantic class label. One
can approach this problem by training a classifier and predicting
a class label for the data elements purely based on their local
properties. This approach, however, does not take into account
any kind of contextual information between different elements in
the image or point cloud. For example, in an application where we
are interested in labeling roadside objects, the fact that most
of the utility poles are connected to some power wires can be
very helpful in disambiguating them from other similar looking
classes. Recurrence of certain class combinations can be also
considered as a good contextual hint since they are very likely
to co-occur again. These forms of high-level contextual
information are often formulated using pairwise and higher-order
Conditional Random Fields (CRFs). A CRF is a probabilistic
graphical model that encodes the contextual relationships between
the data elements in a scene. In this thesis, we study the
potential of contextual information at different scales (ranges)
in scene parsing problems.
First, we propose a model that utilizes the local context of the
scene via a pairwise CRF. Our model acquires contextual
interactions between different classes by assessing their
misclassification rates using only the local properties of data.
In other words, no extra training is required for obtaining the
class interaction information.
Next, we expand the context field of view from a local range to a
longer range, and make use of higher-order models to encode more
complex contextual cues. More specifically, we introduce a new
model to employ geometric higher-order terms in a CRF for
semantic labeling of 3D point cloud data.
Despite the potential of the above models at capturing the
contextual cues in the scene, there are higher-level context cues
that cannot be encoded via pairwise and higher-order CRFs. For
instance, a vehicle is very unlikely to appear in a sea scene, or
buildings are frequently observed in a street scene. Such
information can be described using scene context and are modeled
using global image descriptors. In particular, through an image
retrieval procedure, we find images whose content is similar to
that of the query image, and use them for scene parsing. Another
problem of the above methods is that they rely on a
computationally expensive training process for the classification
using the local properties of data elements, which needs to be
repeated every time the training data is modified. We address
this issue by proposing a fast and efficient approach that
exempts us from the cumbersome training task, by transferring the
ground-truth information directly from the training data to the
test data
Co-inference for Multi-modal Scene Analysis
<p>We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and have different sampling rates. Previous work has addressed this problem by restricting interpretation to a single representation in one of the domains, with augmented features that attempt to encode the information from the other modalities. Instead, we propose to analyze all modalities simultaneously while propagating information across domains during the inference procedure. In addition to the immediate benefit of generating a complete interpretation in all of the modalities, we demonstrate that this co-inference approach also improves performance over the canonical approach.</p