4,696 research outputs found
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
Online Inference and Detection of Curbs in Partially Occluded Scenes with Sparse LIDAR
Road boundaries, or curbs, provide autonomous vehicles with essential
information when interpreting road scenes and generating behaviour plans.
Although curbs convey important information, they are difficult to detect in
complex urban environments (in particular in comparison to other elements of
the road such as traffic signs and road markings). These difficulties arise
from occlusions by other traffic participants as well as changing lighting
and/or weather conditions. Moreover, road boundaries have various shapes,
colours and structures while motion planning algorithms require accurate and
precise metric information in real-time to generate their plans.
In this paper, we present a real-time LIDAR-based approach for accurate curb
detection around the vehicle (360 degree). Our approach deals with both
occlusions from traffic and changing environmental conditions. To this end, we
project 3D LIDAR pointcloud data into 2D bird's-eye view images (akin to
Inverse Perspective Mapping). These images are then processed by trained deep
networks to infer both visible and occluded road boundaries. Finally, a
post-processing step filters detected curb segments and tracks them over time.
Experimental results demonstrate the effectiveness of the proposed approach on
real-world driving data. Hence, we believe that our LIDAR-based approach
provides an efficient and effective way to detect visible and occluded curbs
around the vehicles in challenging driving scenarios.Comment: Accepted at the 22nd IEEE Intelligent Transportation Systems
Conference (ITSC19), October, 2019, Auckland, New Zealan
Is the Pedestrian going to Cross? Answering by 2D Pose Estimation
Our recent work suggests that, thanks to nowadays powerful CNNs, image-based
2D pose estimation is a promising cue for determining pedestrian intentions
such as crossing the road in the path of the ego-vehicle, stopping before
entering the road, and starting to walk or bending towards the road. This
statement is based on the results obtained on non-naturalistic sequences
(Daimler dataset), i.e. in sequences choreographed specifically for performing
the study. Fortunately, a new publicly available dataset (JAAD) has appeared
recently to allow developing methods for detecting pedestrian intentions in
naturalistic driving conditions; more specifically, for addressing the relevant
question is the pedestrian going to cross? Accordingly, in this paper we use
JAAD to assess the usefulness of 2D pose estimation for answering such a
question. We combine CNN-based pedestrian detection, tracking and pose
estimation to predict the crossing action from monocular images. Overall, the
proposed pipeline provides new state-of-the-art results.Comment: This is a paper presented in IEEE Intelligent Vehicles Symposium
(IEEE IV 2018
MOZARD: Multi-Modal Localization for Autonomous Vehicles in Urban Outdoor Environments
Visually poor scenarios are one of the main sources of failure in visual
localization systems in outdoor environments. To address this challenge, we
present MOZARD, a multi-modal localization system for urban outdoor
environments using vision and LiDAR. By extending our preexisting key-point
based visual multi-session local localization approach with the use of semantic
data, an improved localization recall can be achieved across vastly different
appearance conditions. In particular we focus on the use of curbstone
information because of their broad distribution and reliability within urban
environments. We present thorough experimental evaluations on several driving
kilometers in challenging urban outdoor environments, analyze the recall and
accuracy of our localization system and demonstrate in a case study possible
failure cases of each subsystem. We demonstrate that MOZARD is able to bridge
scenarios where our previous work VIZARD fails, hence yielding an increased
recall performance, while a similar localization accuracy of 0.2m is achieve
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
- …