17,448 research outputs found
Semantic Label Reduction Techniques for Autonomous Driving
Semantic segmentation maps can be used as input to models for maneuvering the
controls of a car. However, not all labels may be necessary for making the
control decision. One would expect that certain labels such as road lanes or
sidewalks would be more critical in comparison with labels for vegetation or
buildings which may not have a direct influence on the car's driving decision.
In this appendix, we evaluate and quantify how sensitive and important the
different semantic labels are for controlling the car. Labels that do not
influence the driving decision are remapped to other classes, thereby
simplifying the task by reducing to only labels critical for driving of the
vehicle
Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving
One of the fundamental challenges in the design of perception systems for
autonomous vehicles is validating the performance of each algorithm under a
comprehensive variety of operating conditions. In the case of vision-based
semantic segmentation, there are known issues when encountering new scenarios
that are sufficiently different to the training data. In addition, even small
variations in environmental conditions such as illumination and precipitation
can affect the classification performance of the segmentation model. Given the
reliance on visual information, these effects often translate into poor
semantic pixel classification which can potentially lead to catastrophic
consequences when driving autonomously. This paper presents a novel method for
analysing the robustness of semantic segmentation models and provides a number
of metrics to evaluate the classification performance over a variety of
environmental conditions. The process incorporates an additional sensor (lidar)
to automate the process, eliminating the need for labour-intensive hand
labelling of validation data. The system integrity can be monitored as the
performance of the vision sensors are validated against a different sensor
modality. This is necessary for detecting failures that are inherent to vision
technology. Experimental results are presented based on multiple datasets
collected at different times of the year with different environmental
conditions. These results show that the semantic segmentation performance
varies depending on the weather, camera parameters, existence of shadows, etc..
The results also demonstrate how the metrics can be used to compare and
validate the performance after making improvements to a model, and compare the
performance of different networks
The ApolloScape Open Dataset for Autonomous Driving and its Application
Autonomous driving has attracted tremendous attention especially in the past
few years. The key techniques for a self-driving car include solving tasks like
3D map construction, self-localization, parsing the driving road and
understanding objects, which enable vehicles to reason and act. However, large
scale data set for training and system evaluation is still a bottleneck for
developing robust perception models. In this paper, we present the ApolloScape
dataset [1] and its applications for autonomous driving. Compared with existing
public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape
contains much large and richer labelling including holistic semantic dense
point cloud for each site, stereo, per-pixel semantic labelling, lanemark
labelling, instance segmentation, 3D car instance, high accurate location for
every frame in various driving videos from multiple sites, cities and daytimes.
For each task, it contains at lease 15x larger amount of images than SOTA
datasets. To label such a complete dataset, we develop various tools and
algorithms specified for each task to accelerate the labelling process, such as
3D-2D segment labeling tools, active labelling in videos etc. Depend on
ApolloScape, we are able to develop algorithms jointly consider the learning
and inference of multiple tasks. In this paper, we provide a sensor fusion
scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and
a 3D semantic map in order to achieve robust self-localization and semantic
segmentation for autonomous driving. We show that practically, sensor fusion
and joint learning of multiple tasks are beneficial to achieve a more robust
and accurate system. We expect our dataset and proposed relevant algorithms can
support and motivate researchers for further development of multi-sensor fusion
and multi-task learning in the field of computer vision.Comment: Version 4: Accepted by TPAMI. Version 3: 17 pages, 10 tables, 11
figures, added the application (DeLS-3D) based on the ApolloScape Dataset.
Version 2: 7 pages, 6 figures, added comparison with BDD100K datase
Unlimited Road-scene Synthetic Annotation (URSA) Dataset
In training deep neural networks for semantic segmentation, the main limiting
factor is the low amount of ground truth annotation data that is available in
currently existing datasets. The limited availability of such data is due to
the time cost and human effort required to accurately and consistently label
real images on a pixel level. Modern sandbox video game engines provide open
world environments where traffic and pedestrians behave in a pseudo-realistic
manner. This caters well to the collection of a believable road-scene dataset.
Utilizing open-source tools and resources found in single-player modding
communities, we provide a method for persistent, ground truth, asset annotation
of a game world. By collecting a synthetic dataset containing upwards of
images, we demonstrate real-time, on-demand, ground truth data
annotation capability of our method. Supplementing this synthetic data to
Cityscapes dataset, we show that our data generation method provides
qualitative as well as quantitative improvements---for training networks---over
previous methods that use video games as surrogate.Comment: Accepted in The 21st IEEE International Conference on Intelligent
Transportation System
Unbiasing Semantic Segmentation For Robot Perception using Synthetic Data Feature Transfer
Robot perception systems need to perform reliable image segmentation in
real-time on noisy, raw perception data. State-of-the-art segmentation
approaches use large CNN models and carefully constructed datasets; however,
these models focus on accuracy at the cost of real-time inference. Furthermore,
the standard semantic segmentation datasets are not large enough for training
CNNs without augmentation and are not representative of noisy, uncurated robot
perception data. We propose improving the performance of real-time segmentation
frameworks on robot perception data by transferring features learned from
synthetic segmentation data. We show that pretraining real-time segmentation
architectures with synthetic segmentation data instead of ImageNet improves
fine-tuning performance by reducing the bias learned in pretraining and closing
the \textit{transfer gap} as a result. Our experiments show that our real-time
robot perception models pretrained on synthetic data outperform those
pretrained on ImageNet for every scale of fine-tuning data examined. Moreover,
the degree to which synthetic pretraining outperforms ImageNet pretraining
increases as the availability of robot data decreases, making our approach
attractive for robotics domains where dataset collection is hard and/or
expensive
Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation
Mobile robots and autonomous vehicles rely on multi-modal sensor setups to
perceive and understand their surroundings. Aside from cameras, LiDAR sensors
represent a central component of state-of-the-art perception systems. In
addition to accurate spatial perception, a comprehensive semantic understanding
of the environment is essential for efficient and safe operation. In this paper
we present a novel deep neural network architecture called LiLaNet for
point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network
utilizes virtual image projections of the 3D point clouds for efficient
inference. Further, we propose an automated process for large-scale cross-modal
training data generation called Autolabeling, in order to boost semantic
labeling performance while keeping the manual annotation effort low. The
effectiveness of the proposed network architecture as well as the automated
data generation process is demonstrated on a manually annotated ground truth
dataset. LiLaNet is shown to significantly outperform current state-of-the-art
CNN architectures for LiDAR data. Applying our automatically generated
large-scale training data yields a boost of up to 14 percentage points compared
to networks trained on manually annotated data only
Universal Semi-Supervised Semantic Segmentation
In recent years, the need for semantic segmentation has arisen across several
different applications and environments. However, the expense and redundancy of
annotation often limits the quantity of labels available for training in any
domain, while deployment is easier if a single model works well across domains.
In this paper, we pose the novel problem of universal semi-supervised semantic
segmentation and propose a solution framework, to meet the dual needs of lower
annotation and deployment costs. In contrast to counterpoints such as fine
tuning, joint training or unsupervised domain adaptation, universal
semi-supervised segmentation ensures that across all domains: (i) a single
model is deployed, (ii) unlabeled data is used, (iii) performance is improved,
(iv) only a few labels are needed and (v) label spaces may differ. To address
this, we minimize supervised as well as within and cross-domain unsupervised
losses, introducing a novel feature alignment objective based on pixel-aware
entropy regularization for the latter. We demonstrate quantitative advantages
over other approaches on several combinations of segmentation datasets across
different geographies (Germany, England, India) and environments (outdoors,
indoors), as well as qualitative insights on the aligned representations.Comment: Accepted as poster presentation at ICCV 201
SoilingNet: Soiling Detection on Automotive Surround-View Cameras
Cameras are an essential part of sensor suite in autonomous driving.
Surround-view cameras are directly exposed to external environment and are
vulnerable to get soiled. Cameras have a much higher degradation in performance
due to soiling compared to other sensors. Thus it is critical to accurately
detect soiling on the cameras, particularly for higher levels of autonomous
driving. We created a new dataset having multiple types of soiling namely
opaque and transparent. It will be released publicly as part of our WoodScape
dataset \cite{yogamani2019woodscape} to encourage further research. We
demonstrate high accuracy using a Convolutional Neural Network (CNN) based
architecture. We also show that it can be combined with the existing object
detection task in a multi-task learning framework. Finally, we make use of
Generative Adversarial Networks (GANs) to generate more images for data
augmentation and show that it works successfully similar to the style transfer.Comment: Accepted for Oral Presentation at IEEE Intelligent Transportation
Systems Conference (ITSC) 201
DSNet for Real-Time Driving Scene Semantic Segmentation
We focus on the very challenging task of semantic segmentation for autonomous
driving system. It must deliver decent semantic segmentation result for traffic
critical objects real-time. In this paper, we propose a very efficient yet
powerful deep neural network for driving scene semantic segmentation termed as
Driving Segmentation Network (DSNet). DSNet achieves state-of-the-art balance
between accuracy and inference speed through efficient units and architecture
design inspired by ShuffleNet V2 and ENet. More importantly, DSNet highlights
classes most critical with driving decision making through our novel Driving
Importance-weighted Loss. We evaluate DSNet on Cityscapes dataset, our DSNet
achieves 71.8% mean Intersection-over-Union (IoU) on validation set and 69.3%
on test set. Class-wise IoU scores show that Driving Importance-weighted Loss
could improve most driving critical classes by a large margin. Compared with
ENet, DSNet is 18.9% more accurate and 1.1+ times faster which implies great
potential for autonomous driving application.Comment: We have discovered some reported numbers unreproducible, and decided
to redesign the methods, and rewrite most of the pape
Detection and Retrieval of Out-of-Distribution Objects in Semantic Segmentation
When deploying deep learning technology in self-driving cars, deep neural
networks are constantly exposed to domain shifts. These include, e.g., changes
in weather conditions, time of day, and long-term temporal shift. In this work
we utilize a deep neural network trained on the Cityscapes dataset containing
urban street scenes and infer images from a different dataset, the A2D2
dataset, containing also countryside and highway images. We present a novel
pipeline for semantic segmenation that detects out-of-distribution (OOD)
segments by means of the deep neural network's prediction and performs image
retrieval after feature extraction and dimensionality reduction on image
patches. In our experiments we demonstrate that the deployed OOD approach is
suitable for detecting out-of-distribution concepts. Furthermore, we evaluate
the image patch retrieval qualitatively as well as quantitatively by means of
the semi-compatible A2D2 ground truth and obtain mAP values of up to 52.2%
- …