2 research outputs found
Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images
This work tackles scene understanding for outdoor robotic navigation, solely
relying on images captured by an on-board camera. Conventional visual scene
understanding interprets the environment based on specific descriptive
categories. However, such a representation is not directly interpretable for
decision-making and constrains robot operation to a specific domain. Thus, we
propose to segment egocentric images directly in terms of how a robot can
navigate in them, and tailor the learning problem to an autonomous navigation
task. Building around an image segmentation network, we present a generic
affordance consisting of 3 driveability levels which can broadly apply to both
urban and off-road scenes. By encoding these levels with soft ordinal labels,
we incorporate inter-class distances during learning which improves
segmentation compared to standard "hard" one-hot labelling. In addition, we
propose a navigation-oriented pixel-wise loss weighting method which assigns
higher importance to safety-critical areas. We evaluate our approach on
large-scale public image segmentation datasets ranging from sunny city streets
to snowy forest trails. In a cross-dataset generalization experiment, we show
that our affordance learning scheme can be applied across a diverse mix of
datasets and improves driveability estimation in unseen environments compared
to general-purpose, single-dataset segmentation.Comment: Accepted in Robotics and Automation Letters (RA-L 2022).
Supplementary video available at https://youtu.be/q_XfjUDO39
From CAD models to soft point cloud labels: An automatic annotation pipeline for cheaply supervised 3D semantic segmentation
We propose a fully automatic annotation scheme which takes a raw 3D point
cloud with a set of fitted CAD models as input, and outputs convincing
point-wise labels which can be used as cheap training data for point cloud
segmentation. Compared to manual annotations, we show that our automatic labels
are accurate while drastically reducing the annotation time, and eliminating
the need for manual intervention or dataset-specific parameters. Our labeling
pipeline outputs semantic classes and soft point-wise object scores which can
either be binarized into standard one-hot-encoded labels, thresholded into weak
labels with ambiguous points left unlabeled, or used directly as soft labels
during training. We evaluate the label quality and segmentation performance of
PointNet++ on a dataset of real industrial point clouds and Scan2CAD, a public
dataset of indoor scenes. Our results indicate that reducing supervision in
areas which are more difficult to label automatically is beneficial, compared
to the conventional approach of naively assigning a hard "best guess" label to
every point