603 research outputs found

    Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion

    Full text link
    In recent years, dynamic vision sensors (DVS), also known as event-based cameras or neuromorphic sensors, have seen increased use due to various advantages over conventional frame-based cameras. Using principles inspired by the retina, its high temporal resolution overcomes motion blurring, its high dynamic range overcomes extreme illumination conditions and its low power consumption makes it ideal for embedded systems on platforms such as drones and self-driving cars. However, event-based data sets are scarce and labels are even rarer for tasks such as object detection. We transferred discriminative knowledge from a state-of-the-art frame-based convolutional neural network (CNN) to the event-based modality via intermediate pseudo-labels, which are used as targets for supervised learning. We show, for the first time, event-based car detection under ego-motion in a real environment at 100 frames per second with a test average precision of 40.3% relative to our annotated ground truth. The event-based car detector handles motion blur and poor illumination conditions despite not explicitly trained to do so, and even complements frame-based CNN detectors, suggesting that it has learnt generalized visual representations

    Anomaly Detection in 3D Space for Autonomous Driving

    Get PDF
    Current state-of-the-art perception models do not always detect all objects in an image. Therefore, they cannot currently be relied upon in safety critical applications such as autonomous driving. Objects that cannot be detected are called anomalies. Current work on anomaly detection is primarily based on camera data. This work evaluates to what extent it is possible today to do anomaly detection in 3D on pseudo-lidar data. A pseudo-lidar is a model that estimates 3D depth for each pixel of an image. Currently, there is no approach available that performs anomaly detection on pseudo-lidar data. Research Question 1 (RQ1) considers whether dissimilarities between lidar and pseudo-lidar are an indicator of anomalies. For this purpose, it is evaluated whether there are larger deviations between pseudo-lidar and lidar point clouds for anomalies compared to non-anomalies. There is no multi-modal dataset for anomaly detection available which could be directly used. Therefore, in the multi-modal KITTI-360 dataset each instance that was not correctly segmented by a panoptic segmentation model, was labeled as anomaly. It is shown how the anomaly definition depends on the used segmentation criterion. The dataset contains 2D images with instance labels and corresponding 3D lidar point clouds with corresponding instance labels. With these labels, one can compare pseudo-lidar and lidar instance by instance. The 2D instance labels allow to define which instances are correctly segmented. In a next step the chamfer-distance between point clouds of each instance in lidar and pseudo-lidar are calculated. Lidar was used as physically captured ground truth. It has been found that the deviation between lidar and pseudo-lidar is similar for anomalies and non-anomalies. Thus, the dissimilarity between lidar and pseudo-lidar can not be used as an indicator for an anomaly. In Research Question 2 (RQ2) it was analyzed how good a pseudo-lidar can map anomalies of type novelties in 3D. Therefore, one augmented anomaly dataset, and two real world anomaly datasets were considered. All these datasets are image based. For answering the research questions, both a quantitative and qualitative analysis was carried out. As a quantitative analysis, Monte Carlo Dropout was applied onto these datasets to evaluate the uncertainty of the model. The 3D point clouds estimated with the pseudo-lidar were visualized for the qualitative analysis. The qualitative analysis shows that some anomalies can be mapped well in 3D and others are not mapped at all. Furthermore, it is shown that augmented anomalies can be sometimes mapped ambiguously in 3D. In the quantitative analysis, it is shown for all datasets considered that the pseudo-lidar is more certain for anomaly regions than for non-anomaly regions which is interpreted as a consequence of an overconfidence of the model. Furthermore, the anomaly concept is not always consistent across different modalities. The Research Question 3 (RQ3) analyzed whether anomalies can be found using flow estimation on pseudo-lidar predicted point clouds. An anomaly would be present in theory if the motion segmentation model contradicted with a panoptic segmentation model. For this purpose, it was investigated whether the pseudo-lidar estimated point clouds are consistent enough through time to do flow estimation on them. For this purpose, the multi-modal KITTI-360 dataset was used. For each instance in the pseudo-lidar it was determined how much a point cloud of an instance differs from the same instance in the next frame. For the consistency evaluation of static and dynamic instances, the ego motion has to be extracted. The pseudo-lidar prediction is consistent between frames if for static instances the distance is small and for dynamic instances the distance is equal to the motion of the instance. It has been shown that the pseudo-lidar makes inconsistent predictions over time, and therefore one cannot distinguish between static and dynamic instances based on pseudo-lidar point clouds. It follows that a flow-based approach to anomaly detection is not possible for point clouds predicted by current single image based pseudo-lidars

    Anomaly Detection in Lidar Data by Combining Supervised and Self-Supervised Methods

    Get PDF
    To enable safe autonomous driving, a reliable and redundant perception of the environment is required. In the context of autonomous vehicles, the perception is mainly based on machine learning models that analyze data from various sensors such as camera, Radio Detection and Ranging (radar), and Light Detection and Ranging (lidar). Since the performance of the models depends significantly on the training data used, it is necessary to ensure perception even in situations that are difficult to analyze and deviate from the training dataset. These situations are called corner cases or anomalies. Motivated by the need to detect such situations, this thesis presents a new approach for detecting anomalies in lidar data by combining Supervised (SV) and Self-Supervised (SSV) models. In particular, inconsistent point-wise predictions between a SV and a SSV part serve as an indication of anomalies arising from the models used themselves, e.g., due to lack of knowledge. The SV part is composed of a SV semantic segmentation model and a SV moving object segmentation model, which together assign a semantic motion class to each point of the point cloud. Based on the definition of semantic motion classes, a first motion label, denoting whether the point is static or dynamic, is predicted for each point. The SSV part mainly consists of a SSV scene flow model and a SSV odometry model and predicts a second motion label for each point. Thereby, the scene flow model estimates a displacement vector for each point, which, using the odometry information of the odometry model, represents only a point’s own induced motion. A separate quantitative analysis of the two parts and a qualitative analysis of the anomaly detection capabilities by combining the two parts are performed. In the qualitative analysis, the frames are classified into four main categories, namely correctly consistent, incorrectly consistent, anomalies detected by the SSV part, and anomalies detected by the SV part. In addition, weaknesses were identified in both the SV part and the SSV part

    Towards Object-Centric Scene Understanding

    Get PDF
    Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities. Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks. In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks. Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks

    Radars for Autonomous Driving: A Review of Deep Learning Methods and Challenges

    Full text link
    Radar is a key component of the suite of perception sensors used for safe and reliable navigation of autonomous vehicles. Its unique capabilities include high-resolution velocity imaging, detection of agents in occlusion and over long ranges, and robust performance in adverse weather conditions. However, the usage of radar data presents some challenges: it is characterized by low resolution, sparsity, clutter, high uncertainty, and lack of good datasets. These challenges have limited radar deep learning research. As a result, current radar models are often influenced by lidar and vision models, which are focused on optical features that are relatively weak in radar data, thus resulting in under-utilization of radar's capabilities and diminishing its contribution to autonomous perception. This review seeks to encourage further deep learning research on autonomous radar data by 1) identifying key research themes, and 2) offering a comprehensive overview of current opportunities and challenges in the field. Topics covered include early and late fusion, occupancy flow estimation, uncertainty modeling, and multipath detection. The paper also discusses radar fundamentals and data representation, presents a curated list of recent radar datasets, and reviews state-of-the-art lidar and vision models relevant for radar research. For a summary of the paper and more results, visit the website: autonomous-radars.github.io

    MS3D: Leveraging Multiple Detectors for Unsupervised Domain Adaptation in 3D Object Detection

    Full text link
    We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shifts and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the choice of pre-trained source detectors has minimal impact on the self-training result, making MS3D suitable for real-world applications.Comment: Our code is available at https://github.com/darrenjkt/MS3

    Deep Learning Assisted Intelligent Visual and Vehicle Tracking Systems

    Get PDF
    Sensor fusion and tracking is the ability to bring together measurements from multiple sensors of the current and past time to estimate the current state of a system. The resulting state estimate is more accurate compared with the direct sensor measurement because it balances between the state prediction based on the assumed motion model and the noisy sensor measurement. Systems can then use the information provided by the sensor fusion and tracking process to support more-intelligent actions and achieve autonomy in a system like an autonomous vehicle. In the past, widely used sensor data are structured, which can be directly used in the tracking system, e.g., distance, temperature, acceleration, and force. The measurements\u27 uncertainty can be estimated from experiments. However, currently, a large number of unstructured data sources can be generated from sensors such as cameras and LiDAR sensors, which bring new challenges to the fusion and tracking system. The traditional algorithm cannot directly use these unstructured data, and it needs another method or process to “understand” them first. For example, if a system tries to track a particular person in a video sequence, it needs to understand where the person is in the first place. However, the traditional tracking method cannot finish such a task. The measurement model for unstructured data is usually difficult to construct. Deep learning techniques provide promising solutions to this type of problem. A deep learning method can learn and understand the unstructured data to accomplish tasks such as object detection in images, object localization in LiDAR point clouds, and driver behavior prediction from the current traffic conditions. Deep-learning architectures such as deep neural networks, deep belief networks, recurrent neural networks, and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, and machine translation, where they have produced results comparable with human expert performance. How to incorporate information obtained via deep learning into our tracking system is one of the topics of this dissertation. Another challenging task is using learning methods to improve a tracking filter\u27s performance. In a tracking system, many manually tuned system parameters affect the tracking performance, e.g., the process noise covariance and measurement noise covariance in a Kalman Filter (KF). These parameters used to be estimated by running the tracking algorithm several times and selecting the one that gives the optimal performance. How to learn the system parameters automatically from data, and how to use machine learning techniques directly to provide useful information to the tracking systems are critical to the proposed tracking system. The proposed research on the intelligent tracking system has two objectives. The first objective is to make a visual tracking filter smart enough to understand unstructured data sources. The second objective is to apply learning algorithms to improve a tracking filter\u27s performance. The goal is to develop an intelligent tracking system that can understand the unstructured data and use the data to improve itself
    • …
    corecore