Vision-based Deep Learning Model for Guiding Multi-fingered Robotic Grasping

Abstract

Grasping is an area where humans still vastly outperform robots. By leveraging recent advances in deep learning we propose a vision-based model to generate human-inspired sequences of grasping primitives suitable for transfer to multi-fingered robotic hands. The proposed model, inspired by Neural Image Captioning, consists of a convolutional and recurrent part. The convolutional part employs a pre-trained model from ILSVRC-2014 adapted to combine features from multiple points of view of a single object by using a view pooling layer. The extracted features are then used to seed Long Short Term Memory recurrent units and generate sequences of primitives that can be used to guide a sophisticated multi-fingered robotic hand during the approach leading to a grasp

    Similar works