We address the problem of localizing waste objects from a color image and an
optional depth image, which is a key perception component for robotic
interaction with such objects. Specifically, our method integrates the
intensity and depth information at multiple levels of spatial granularity.
Firstly, a scene-level deep network produces an initial coarse segmentation,
based on which we select a few potential object regions to zoom in and perform
fine segmentation. The results of the above steps are further integrated into a
densely connected conditional random field that learns to respect the
appearance, depth, and spatial affinities with pixel-level accuracy. In
addition, we create a new RGBD waste object segmentation dataset, MJU-Waste,
that is made public to facilitate future research in this area. The efficacy of
our method is validated on both MJU-Waste and the Trash Annotation in Context
(TACO) dataset.Comment: Paper appears in Sensors 2020, 20(14), 381