1,329 research outputs found
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
For humans, the process of grasping an object relies heavily on rich tactile
feedback. Most recent robotic grasping work, however, has been based only on
visual input, and thus cannot easily benefit from feedback after initiating
contact. In this paper, we investigate how a robot can learn to use tactile
information to iteratively and efficiently adjust its grasp. To this end, we
propose an end-to-end action-conditional model that learns regrasping policies
from raw visuo-tactile data. This model -- a deep, multimodal convolutional
network -- predicts the outcome of a candidate grasp adjustment, and then
executes a grasp by iteratively selecting the most promising actions. Our
approach requires neither calibration of the tactile sensors, nor any
analytical modeling of contact forces, thus reducing the engineering effort
required to obtain efficient grasping policies. We train our model with data
from about 6,450 grasping trials on a two-finger gripper equipped with GelSight
high-resolution tactile sensors on each finger. Across extensive experiments,
our approach outperforms a variety of baselines at (i) estimating grasp
adjustment outcomes, (ii) selecting efficient grasp adjustments for quick
grasping, and (iii) reducing the amount of force applied at the fingers, while
maintaining competitive performance. Finally, we study the choices made by our
model and show that it has successfully acquired useful and interpretable
grasping behaviors.Comment: 8 pages. Published on IEEE Robotics and Automation Letters (RAL).
Website: https://sites.google.com/view/more-than-a-feelin
Grasp Stability Assessment Through Attention-Guided Cross-Modality Fusion and Transfer Learning
Extensive research has been conducted on assessing grasp stability, a crucial
prerequisite for achieving optimal grasping strategies, including the minimum
force grasping policy. However, existing works employ basic feature-level
fusion techniques to combine visual and tactile modalities, resulting in the
inadequate utilization of complementary information and the inability to model
interactions between unimodal features. This work proposes an attention-guided
cross-modality fusion architecture to comprehensively integrate visual and
tactile features. This model mainly comprises convolutional neural networks
(CNNs), self-attention, and cross-attention mechanisms. In addition, most
existing methods collect datasets from real-world systems, which is
time-consuming and high-cost, and the datasets collected are comparatively
limited in size. This work establishes a robotic grasping system through
physics simulation to collect a multimodal dataset. To address the sim-to-real
transfer gap, we propose a migration strategy encompassing domain randomization
and domain adaptation techniques. The experimental results demonstrate that the
proposed fusion framework achieves markedly enhanced prediction performance
(approximately 10%) compared to other baselines. Moreover, our findings suggest
that the trained model can be reliably transferred to real robotic systems,
indicating its potential to address real-world challenges.Comment: Accepted by IROS 202
- …