Perceiving accurate 3D object shape is important for robots to interact with
the physical world. Current research along this direction has been primarily
relying on visual observations. Vision, however useful, has inherent
limitations due to occlusions and the 2D-3D ambiguities, especially for
perception with a monocular camera. In contrast, touch gets precise local shape
information, though its efficiency for reconstructing the entire shape could be
low. In this paper, we propose a novel paradigm that efficiently perceives
accurate 3D object shape by incorporating visual and tactile observations, as
well as prior knowledge of common object shapes learned from large-scale shape
repositories. We use vision first, applying neural networks with learned shape
priors to predict an object's 3D shape from a single-view color image. We then
use tactile sensing to refine the shape; the robot actively touches the object
regions where the visual prediction has high uncertainty. Our method
efficiently builds the 3D shape of common objects from a color image and a
small number of tactile explorations (around 10). Our setup is easy to apply
and has potentials to help robots better perform grasping or manipulation tasks
on real-world objects.Comment: IROS 2018. The first two authors contributed equally to this wor