271 research outputs found

    Extraction of Vehicle Groups in Airborne Lidar Point Clouds with Two-Level Point Processes

    Get PDF
    In this paper we present a new object based hierarchical model for joint probabilistic extraction of vehicles and groups of corresponding vehicles - called traffic segments - in airborne Lidar point clouds collected from dense urban areas. Firstly, the 3-D point set is classified into terrain, vehicle, roof, vegetation and clutter classes. Then the points with the corresponding class labels and echo strength (i.e. intensity) values are projected to the ground. In the obtained 2-D class and intensity maps we approximate the top view projections of vehicles by rectangles. Since our tasks are simultaneously the extraction of the rectangle population which describes the position, size and orientation of the vehicles and grouping the vehicles into the traffic segments, we propose a hierarchical, Two-Level Marked Point Process (L2MPP) model for the problem. The output vehicle and traffic segment configurations are extracted by an iterative stochastic optimization algorithm. We have tested the proposed method with real data of a discrete return Lidar sensor providing up to four range measurements for each laser pulse. Using manually annotated Ground Truth information on a data set containing 1009 vehicles, we provide quantitative evaluation results showing that the L2MPP model surpasses two earlier grid-based approaches, a 3-D point-cloud-based process and a single layer MPP solution. The accuracy of the proposed method measured in F-rate is 97% at object level, 83% at pixel level and 95% at group level

    3D object detection from point clouds with dense pose voters

    Get PDF
    Il riconoscimento di oggetti è sempre stato un compito sfidante per la Computer Vision. Trova applicazione in molti campi, principalmente nell’industria, come ad esempio per permettere ad un robot di trovare gli oggetti da afferrare. Negli ultimi decenni tali compiti hanno trovato nuovi modi di essere raggiunti grazie alla riscoperta delle Reti Neurali, in particolare le Reti Neurali Convoluzionali. Questo tipo di reti ha raggiunto ottimi risultati in molte applicazioni per il riconoscimento e la classificazione degli oggetti. La tendenza, ora, `e quella di utilizzare tali reti anche nell’industria automobilistica per cercare di rendere reale il sogno delle automobili che guidano da sole. Ci sono molti lavori importanti sul riconoscimento delle auto dalle immagini. In questa tesi presentiamo la nostra architettura di Rete Neurale Convoluzionale per il riconoscimento di automobili e la loro posizione nello spazio, utilizzando solo input lidar. Salvando le informazioni riguardanti le bounding box attorno all’auto a livello del punto ci assicura una buona previsione anche in situazioni in cui le automobili sono occluse. I test vengono eseguiti sul dataset più utilizzato per il riconoscimento di automobili e pedoni nelle applicazioni di guida autonoma

    Multi-Modal Learning For Adaptive Scene Understanding

    Get PDF
    Modern robotics systems typically possess sensors of different modalities. Segmenting scenes observed by the robot into a discrete set of classes is a central requirement for autonomy. Equally, when a robot navigates through an unknown environment, it is often necessary to adjust the parameters of the scene segmentation model to maintain the same level of accuracy in changing situations. This thesis explores efficient means of adaptive semantic scene segmentation in an online setting with the use of multiple sensor modalities. First, we devise a novel conditional random field(CRF) inference method for scene segmentation that incorporates global constraints, enforcing particular sets of nodes to be assigned the same class label. To do this efficiently, the CRF is formulated as a relaxed quadratic program whose maximum a posteriori(MAP) solution is found using a gradient-based optimization approach. These global constraints are useful, since they can encode "a priori" information about the final labeling. This new formulation also reduces the dimensionality of the original image-labeling problem. The proposed model is employed in an urban street scene understanding task. Camera data is used for the CRF based semantic segmentation while global constraints are derived from 3D laser point clouds. Second, an approach to learn CRF parameters without the need for manually labeled training data is proposed. The model parameters are estimated by optimizing a novel loss function using self supervised reference labels, obtained based on the information from camera and laser with minimum amount of human supervision. Third, an approach that can conduct the parameter optimization while increasing the model robustness to non-stationary data distributions in the long trajectories is proposed. We adopted stochastic gradient descent to achieve this goal by using a learning rate that can appropriately grow or diminish to gain adaptability to changes in the data distribution

    Event Blob Tracking: An Asynchronous Real-Time Algorithm

    Full text link
    Event-based cameras have become increasingly popular for tracking fast-moving objects due to their high temporal resolution, low latency, and high dynamic range. In this paper, we propose a novel algorithm for tracking event blobs using raw events asynchronously in real time. We introduce the concept of an event blob as a spatio-temporal likelihood of event occurrence where the conditional spatial likelihood is blob-like. Many real-world objects generate event blob data, for example, flickering LEDs such as car headlights or any small foreground object moving against a static or slowly varying background. The proposed algorithm uses a nearest neighbour classifier with a dynamic threshold criteria for data association coupled with a Kalman filter to track the event blob state. Our algorithm achieves highly accurate tracking and event blob shape estimation even under challenging lighting conditions and high-speed motions. The microsecond time resolution achieved means that the filter output can be used to derive secondary information such as time-to-contact or range estimation, that will enable applications to real-world problems such as collision avoidance in autonomous driving.Comment: 17 pages, 8 figures, preprint versio

    Multi Sensor Multi Target Perception and Tracking for Informed Decisions in Public Road Scenarios

    Get PDF
    Multi-target tracking in public traffic calls for a tracking system with automated track initiation and termination facilities in a randomly evolving driving environment. Besides, the key problem of data association needs to be handled effectively considering the limitations in the computational resources on-board an autonomous car. The challenge of the tracking problem is further evident in the use of high-resolution automotive sensors which return multiple detections per object. Furthermore, it is customary to use multiple sensors that cover different and/or over-lapping Field of View and fuse sensor detections to provide robust and reliable tracking. As a consequence, in high-resolution multi-sensor settings, the data association uncertainty, and the corresponding tracking complexity increases pointing to a systematic approach to handle and process sensor detections. In this work, we present a multi-target tracking system that addresses target birth/initiation and death/termination processes with automatic track management features. These tracking functionalities can help facilitate perception during common events in public traffic as participants (suddenly) change lanes, navigate intersections, overtake and/or brake in emergencies, etc. Various tracking approaches including the ones based on joint integrated probability data association (JIPDA) filter, Linear Multi-target Integrated Probabilistic Data Association (LMIPDA) Filter, and their multi-detection variants are adapted to specifically include algorithms that handle track initiation and termination, clutter density estimation and track management. The utility of the filtering module is further elaborated by integrating it into a trajectory tracking problem based on model predictive control. To cope with tracking complexity in the case of multiple high-resolution sensors, we propose a hybrid scheme that combines the approaches of data clustering at the local sensor and multiple detections tracking schemes at the fusion layer. We implement a track-to-track fusion scheme that de-correlates local (sensor) tracks to avoid double counting and apply a measurement partitioning scheme to re-purpose the LMIPDA tracking algorithm to multi-detection cases. In addition to the measurement partitioning approach, a joint extent and kinematic state estimation scheme are integrated into the LMIPDA approach to facilitate perception and tracking of an individual as well as group targets as applied to multi-lane public traffic. We formulate the tracking problem as a two hierarchical layer. This arrangement enhances the multi-target tracking performance in situations including but not limited to target initialization(birth process), target occlusion, missed detections, unresolved measurement, target maneuver, etc. Also, target groups expose complex individual target interactions to help in situation assessment which is challenging to capture otherwise. The simulation studies are complemented by experimental studies performed on single and multiple (group) targets. Target detections are collected from a high-resolution radar at a frequency of 20Hz; whereas RTK-GPS data is made available as ground truth for one of the target vehicle\u27s trajectory

    BirdNet+: two-stage 3D object detection in LiDAR through a sparsity-invariant bird's eye view

    Get PDF
    Autonomous navigation relies upon an accurate understanding of the elements in the surroundings. Among the different on-board perception tasks, 3D object detection allows the identification of dynamic objects that cannot be registered by maps, being key for safe navigation. Thus, it often requires the use of LiDAR data, which is able to faithfully represent the scene geometry. However, although raw laser point clouds contain rich features to perform object detection, more compact representations such as the bird's eye view (BEV) projection are usually preferred in order to meet the time requirements of the control loop. This paper presents an end-to-end object detection network based on the well-known Faster R-CNN architecture that uses BEV images as input to produce the final 3D boxes. Our regression branches can infer not only the axis-aligned bounding boxes but also the rotation angle, height, and elevation of the objects in the scene. The proposed network provides state-of-the-art results for car, pedestrian, and cyclist detection with a single forward pass when evaluated on the KITTI 3D Object Detection Benchmark, with an accuracy that exceeds 64% mAP 3D for the Moderate difficulty. Further experiments on the challenging nuScenes dataset show the generalizability of both the method and the proposed BEV representation against different LiDAR devices and across a wider set of object categories by being able to reach more than 30% mAP with a single LiDAR sweep and almost 40% mAP with the usual 10-sweep accumulation.This work was supported in part by the Government of Madrid (Comunidad de Madrid) under the Multiannual Agreement with the University Carlos III of Madrid (UC3M) in the line of "Fostering Young Doctors Research"(PEAVAUTO-CM-UC3M), and in part by the Context of the V Regional Programme of Research and Technological Innovation (PRICIT)

    Automatic registration of multi-modal airborne imagery

    Get PDF
    This dissertation presents a novel technique based on Maximization of Mutual Information (MMI) and multi-resolution to design an algorithm for automatic registration of multi-sensor images captured by various airborne cameras. In contrast to conventional methods that extract and employ feature points, MMI-based algorithms utilize the mutual information found between two given images to compute the registration parameters. These, in turn, are then utilized to perform multi-sensor registration for remote sensing images. The results indicate that the proposed algorithms are very effective in registering infrared images taken at three different wavelengths with a high resolution visual image of a given scene. The MMI technique has proven to be very robust with images acquired with the Wild Airborne Sensor Program (WASP) multi-sensor instrument. This dissertation also shows how wavelet based techniques can be used in a multi-resolution analysis framework to significantly increase computational efficiency for images captured at different resolutions. The fundamental result of this thesis is the technique of using features in the images to enhance the robustness, accuracy and speed of MMI registration. This is done by using features to focus MMI on places that are rich in information. The new algorithm smoothly integrates with MMI and avoids any need for feature-matching, and then the applications of such extensions are studied. The first extension is the registration of cartographic maps and image datum, which is very important for map updating and change detection. This is a difficult problem because map features such as roads and buildings may be mis-located and features extracted from images may not correspond to map features. Nonetheless, it is possible to obtain a general global registration of maps and images by applying statistical techniques to map and image features. To solve the map-to-image registration problem this research extends the MMI technique through a focus-of-attention mechanism that forces MMI to utilize correspondences that have a high probability of being information rich. The gradient-based parameter search and exhaustive parameter search methods are also compared. Both qualitative and quantitative analysis are used to assess the registration accuracy. Another difficult application is the fusion of the LIDAR elevation or intensity data with imagery. Such applications are even more challenging when automated registrations algorithms are needed. To improve the registration robustness, a salient area extraction algorithm is developed to overcome the distortion in the airborne and satellite images from different sensors. This extension combines the SIFT and Harris feature detection algorithms with MMI and the Harris corner label map to address difficult multi-modal registration problems through a combination of selection and focus-of-attention mechanisms together with mutual information. This two-step approach overcomes the above problems and provides a good initialization for the final step of the registration process. Experimental results are provided that demonstrate a variety of mapping applications including multi-modal IR imagery, map and image registration and image and LIDAR registration
    corecore