Self-Supervised Visuo-Tactile Pretraining to Locate and Follow Garment
  Features

Calandra, Roberto; Goldberg, Ken; Hoque, Ryan; Huang, Huang; Ichnowski, Jeffrey; Kerr, Justin; Wilcox, Albert

Self-Supervised Visuo-Tactile Pretraining to Locate and Follow Garment Features

Authors: Roberto Calandra
Ken Goldberg
Ryan Hoque
Huang Huang
Jeffrey Ichnowski
Justin Kerr
Albert Wilcox
Publication date: 31 July 2023
Publisher

Abstract

Humans make extensive use of vision and touch as complementary senses, with vision providing global information about the scene and touch measuring local information during manipulation without suffering from occlusions. While prior work demonstrates the efficacy of tactile sensing for precise manipulation of deformables, they typically rely on supervised, human-labeled datasets. We propose Self-Supervised Visuo-Tactile Pretraining (SSVTP), a framework for learning multi-task visuo-tactile representations in a self-supervised manner through cross-modal supervision. We design a mechanism that enables a robot to autonomously collect precisely spatially-aligned visual and tactile image pairs, then train visual and tactile encoders to embed these pairs into a shared latent space using cross-modal contrastive loss. We apply this latent space to downstream perception and control of deformable garments on flat surfaces, and evaluate the flexibility of the learned representations without fine-tuning on 5 tasks: feature classification, contact localization, anomaly detection, feature search from a visual query (e.g., garment feature localization under occlusion), and edge following along cloth edges. The pretrained representations achieve a 73-100% success rate on these 5 tasks.Comment: RSS 2023, site: https://sites.google.com/berkeley.edu/ssvt

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2209.13042

Last time updated on 20/11/2022