17,448 research outputs found

    Semantic Label Reduction Techniques for Autonomous Driving

    Full text link
    Semantic segmentation maps can be used as input to models for maneuvering the controls of a car. However, not all labels may be necessary for making the control decision. One would expect that certain labels such as road lanes or sidewalks would be more critical in comparison with labels for vegetation or buildings which may not have a direct influence on the car's driving decision. In this appendix, we evaluate and quantify how sensitive and important the different semantic labels are for controlling the car. Labels that do not influence the driving decision are remapped to other classes, thereby simplifying the task by reducing to only labels critical for driving of the vehicle

    Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving

    Full text link
    One of the fundamental challenges in the design of perception systems for autonomous vehicles is validating the performance of each algorithm under a comprehensive variety of operating conditions. In the case of vision-based semantic segmentation, there are known issues when encountering new scenarios that are sufficiently different to the training data. In addition, even small variations in environmental conditions such as illumination and precipitation can affect the classification performance of the segmentation model. Given the reliance on visual information, these effects often translate into poor semantic pixel classification which can potentially lead to catastrophic consequences when driving autonomously. This paper presents a novel method for analysing the robustness of semantic segmentation models and provides a number of metrics to evaluate the classification performance over a variety of environmental conditions. The process incorporates an additional sensor (lidar) to automate the process, eliminating the need for labour-intensive hand labelling of validation data. The system integrity can be monitored as the performance of the vision sensors are validated against a different sensor modality. This is necessary for detecting failures that are inherent to vision technology. Experimental results are presented based on multiple datasets collected at different times of the year with different environmental conditions. These results show that the semantic segmentation performance varies depending on the weather, camera parameters, existence of shadows, etc.. The results also demonstrate how the metrics can be used to compare and validate the performance after making improvements to a model, and compare the performance of different networks

    The ApolloScape Open Dataset for Autonomous Driving and its Application

    Full text link
    Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as 3D-2D segment labeling tools, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.Comment: Version 4: Accepted by TPAMI. Version 3: 17 pages, 10 tables, 11 figures, added the application (DeLS-3D) based on the ApolloScape Dataset. Version 2: 7 pages, 6 figures, added comparison with BDD100K datase

    Unlimited Road-scene Synthetic Annotation (URSA) Dataset

    Full text link
    In training deep neural networks for semantic segmentation, the main limiting factor is the low amount of ground truth annotation data that is available in currently existing datasets. The limited availability of such data is due to the time cost and human effort required to accurately and consistently label real images on a pixel level. Modern sandbox video game engines provide open world environments where traffic and pedestrians behave in a pseudo-realistic manner. This caters well to the collection of a believable road-scene dataset. Utilizing open-source tools and resources found in single-player modding communities, we provide a method for persistent, ground truth, asset annotation of a game world. By collecting a synthetic dataset containing upwards of 1,000,0001,000,000 images, we demonstrate real-time, on-demand, ground truth data annotation capability of our method. Supplementing this synthetic data to Cityscapes dataset, we show that our data generation method provides qualitative as well as quantitative improvements---for training networks---over previous methods that use video games as surrogate.Comment: Accepted in The 21st IEEE International Conference on Intelligent Transportation System

    Unbiasing Semantic Segmentation For Robot Perception using Synthetic Data Feature Transfer

    Full text link
    Robot perception systems need to perform reliable image segmentation in real-time on noisy, raw perception data. State-of-the-art segmentation approaches use large CNN models and carefully constructed datasets; however, these models focus on accuracy at the cost of real-time inference. Furthermore, the standard semantic segmentation datasets are not large enough for training CNNs without augmentation and are not representative of noisy, uncurated robot perception data. We propose improving the performance of real-time segmentation frameworks on robot perception data by transferring features learned from synthetic segmentation data. We show that pretraining real-time segmentation architectures with synthetic segmentation data instead of ImageNet improves fine-tuning performance by reducing the bias learned in pretraining and closing the \textit{transfer gap} as a result. Our experiments show that our real-time robot perception models pretrained on synthetic data outperform those pretrained on ImageNet for every scale of fine-tuning data examined. Moreover, the degree to which synthetic pretraining outperforms ImageNet pretraining increases as the availability of robot data decreases, making our approach attractive for robotics domains where dataset collection is hard and/or expensive

    Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation

    Full text link
    Mobile robots and autonomous vehicles rely on multi-modal sensor setups to perceive and understand their surroundings. Aside from cameras, LiDAR sensors represent a central component of state-of-the-art perception systems. In addition to accurate spatial perception, a comprehensive semantic understanding of the environment is essential for efficient and safe operation. In this paper we present a novel deep neural network architecture called LiLaNet for point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network utilizes virtual image projections of the 3D point clouds for efficient inference. Further, we propose an automated process for large-scale cross-modal training data generation called Autolabeling, in order to boost semantic labeling performance while keeping the manual annotation effort low. The effectiveness of the proposed network architecture as well as the automated data generation process is demonstrated on a manually annotated ground truth dataset. LiLaNet is shown to significantly outperform current state-of-the-art CNN architectures for LiDAR data. Applying our automatically generated large-scale training data yields a boost of up to 14 percentage points compared to networks trained on manually annotated data only

    Universal Semi-Supervised Semantic Segmentation

    Full text link
    In recent years, the need for semantic segmentation has arisen across several different applications and environments. However, the expense and redundancy of annotation often limits the quantity of labels available for training in any domain, while deployment is easier if a single model works well across domains. In this paper, we pose the novel problem of universal semi-supervised semantic segmentation and propose a solution framework, to meet the dual needs of lower annotation and deployment costs. In contrast to counterpoints such as fine tuning, joint training or unsupervised domain adaptation, universal semi-supervised segmentation ensures that across all domains: (i) a single model is deployed, (ii) unlabeled data is used, (iii) performance is improved, (iv) only a few labels are needed and (v) label spaces may differ. To address this, we minimize supervised as well as within and cross-domain unsupervised losses, introducing a novel feature alignment objective based on pixel-aware entropy regularization for the latter. We demonstrate quantitative advantages over other approaches on several combinations of segmentation datasets across different geographies (Germany, England, India) and environments (outdoors, indoors), as well as qualitative insights on the aligned representations.Comment: Accepted as poster presentation at ICCV 201

    SoilingNet: Soiling Detection on Automotive Surround-View Cameras

    Full text link
    Cameras are an essential part of sensor suite in autonomous driving. Surround-view cameras are directly exposed to external environment and are vulnerable to get soiled. Cameras have a much higher degradation in performance due to soiling compared to other sensors. Thus it is critical to accurately detect soiling on the cameras, particularly for higher levels of autonomous driving. We created a new dataset having multiple types of soiling namely opaque and transparent. It will be released publicly as part of our WoodScape dataset \cite{yogamani2019woodscape} to encourage further research. We demonstrate high accuracy using a Convolutional Neural Network (CNN) based architecture. We also show that it can be combined with the existing object detection task in a multi-task learning framework. Finally, we make use of Generative Adversarial Networks (GANs) to generate more images for data augmentation and show that it works successfully similar to the style transfer.Comment: Accepted for Oral Presentation at IEEE Intelligent Transportation Systems Conference (ITSC) 201

    DSNet for Real-Time Driving Scene Semantic Segmentation

    Full text link
    We focus on the very challenging task of semantic segmentation for autonomous driving system. It must deliver decent semantic segmentation result for traffic critical objects real-time. In this paper, we propose a very efficient yet powerful deep neural network for driving scene semantic segmentation termed as Driving Segmentation Network (DSNet). DSNet achieves state-of-the-art balance between accuracy and inference speed through efficient units and architecture design inspired by ShuffleNet V2 and ENet. More importantly, DSNet highlights classes most critical with driving decision making through our novel Driving Importance-weighted Loss. We evaluate DSNet on Cityscapes dataset, our DSNet achieves 71.8% mean Intersection-over-Union (IoU) on validation set and 69.3% on test set. Class-wise IoU scores show that Driving Importance-weighted Loss could improve most driving critical classes by a large margin. Compared with ENet, DSNet is 18.9% more accurate and 1.1+ times faster which implies great potential for autonomous driving application.Comment: We have discovered some reported numbers unreproducible, and decided to redesign the methods, and rewrite most of the pape

    Detection and Retrieval of Out-of-Distribution Objects in Semantic Segmentation

    Full text link
    When deploying deep learning technology in self-driving cars, deep neural networks are constantly exposed to domain shifts. These include, e.g., changes in weather conditions, time of day, and long-term temporal shift. In this work we utilize a deep neural network trained on the Cityscapes dataset containing urban street scenes and infer images from a different dataset, the A2D2 dataset, containing also countryside and highway images. We present a novel pipeline for semantic segmenation that detects out-of-distribution (OOD) segments by means of the deep neural network's prediction and performs image retrieval after feature extraction and dimensionality reduction on image patches. In our experiments we demonstrate that the deployed OOD approach is suitable for detecting out-of-distribution concepts. Furthermore, we evaluate the image patch retrieval qualitatively as well as quantitatively by means of the semi-compatible A2D2 ground truth and obtain mAP values of up to 52.2%
    • …
    corecore