11 research outputs found

    Online Classification in 3D Urban Datasets Based on Hierarchical Detection

    Full text link
    One of the most significant problems in the area of 3D range image processing is that of segmentation and classifi-cation from 3D laser range data, especially in real-time. In this work we introduce a novel multi-layer approach to the classification of 3D laser scan data. In particular, we build a hierarchical framework of online detection and identifica-tion procedures drawn from sequential analysis namely the CUSUM (Cumulative Sum) and SPRT (Sequential Proba-bility Ratio Test), both of which are low complexity algo-rithms. Each layer of algorithms builds upon the decisions made at the previous stage thus providing a robust frame-work of online decision making. In our new framework we are not only able to classify in coarse classes such as verti-cal, horizontal and/or vegetation but to also identify objects characterized by more subtle or gradual changes such as curbs or steps. Moreover, our new multi-layer approach combines information across scanlines and results in more accurate decision making. We perform experiments in com-plex urban scenes and provide quantitative results. 1

    Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

    Get PDF
    Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE

    On the Role of Context at Different Scales in Scene Parsing

    No full text
    Scene parsing can be formulated as a labeling problem where each visual data element, e.g., each pixel of an image or each 3D point in a point cloud, is assigned a semantic class label. One can approach this problem by training a classifier and predicting a class label for the data elements purely based on their local properties. This approach, however, does not take into account any kind of contextual information between different elements in the image or point cloud. For example, in an application where we are interested in labeling roadside objects, the fact that most of the utility poles are connected to some power wires can be very helpful in disambiguating them from other similar looking classes. Recurrence of certain class combinations can be also considered as a good contextual hint since they are very likely to co-occur again. These forms of high-level contextual information are often formulated using pairwise and higher-order Conditional Random Fields (CRFs). A CRF is a probabilistic graphical model that encodes the contextual relationships between the data elements in a scene. In this thesis, we study the potential of contextual information at different scales (ranges) in scene parsing problems. First, we propose a model that utilizes the local context of the scene via a pairwise CRF. Our model acquires contextual interactions between different classes by assessing their misclassification rates using only the local properties of data. In other words, no extra training is required for obtaining the class interaction information. Next, we expand the context field of view from a local range to a longer range, and make use of higher-order models to encode more complex contextual cues. More specifically, we introduce a new model to employ geometric higher-order terms in a CRF for semantic labeling of 3D point cloud data. Despite the potential of the above models at capturing the contextual cues in the scene, there are higher-level context cues that cannot be encoded via pairwise and higher-order CRFs. For instance, a vehicle is very unlikely to appear in a sea scene, or buildings are frequently observed in a street scene. Such information can be described using scene context and are modeled using global image descriptors. In particular, through an image retrieval procedure, we find images whose content is similar to that of the query image, and use them for scene parsing. Another problem of the above methods is that they rely on a computationally expensive training process for the classification using the local properties of data elements, which needs to be repeated every time the training data is modified. We address this issue by proposing a fast and efficient approach that exempts us from the cumbersome training task, by transferring the ground-truth information directly from the training data to the test data

    Co-inference for Multi-modal Scene Analysis

    No full text
    <p>We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and have different sampling rates. Previous work has addressed this problem by restricting interpretation to a single representation in one of the domains, with augmented features that attempt to encode the information from the other modalities. Instead, we propose to analyze all modalities simultaneously while propagating information across domains during the inference procedure. In addition to the immediate benefit of generating a complete interpretation in all of the modalities, we demonstrate that this co-inference approach also improves performance over the canonical approach.</p