7,725 research outputs found
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
For humans, the process of grasping an object relies heavily on rich tactile
feedback. Most recent robotic grasping work, however, has been based only on
visual input, and thus cannot easily benefit from feedback after initiating
contact. In this paper, we investigate how a robot can learn to use tactile
information to iteratively and efficiently adjust its grasp. To this end, we
propose an end-to-end action-conditional model that learns regrasping policies
from raw visuo-tactile data. This model -- a deep, multimodal convolutional
network -- predicts the outcome of a candidate grasp adjustment, and then
executes a grasp by iteratively selecting the most promising actions. Our
approach requires neither calibration of the tactile sensors, nor any
analytical modeling of contact forces, thus reducing the engineering effort
required to obtain efficient grasping policies. We train our model with data
from about 6,450 grasping trials on a two-finger gripper equipped with GelSight
high-resolution tactile sensors on each finger. Across extensive experiments,
our approach outperforms a variety of baselines at (i) estimating grasp
adjustment outcomes, (ii) selecting efficient grasp adjustments for quick
grasping, and (iii) reducing the amount of force applied at the fingers, while
maintaining competitive performance. Finally, we study the choices made by our
model and show that it has successfully acquired useful and interpretable
grasping behaviors.Comment: 8 pages. Published on IEEE Robotics and Automation Letters (RAL).
Website: https://sites.google.com/view/more-than-a-feelin
Self-Supervised Visuo-Tactile Pretraining to Locate and Follow Garment Features
Humans make extensive use of vision and touch as complementary senses, with
vision providing global information about the scene and touch measuring local
information during manipulation without suffering from occlusions. While prior
work demonstrates the efficacy of tactile sensing for precise manipulation of
deformables, they typically rely on supervised, human-labeled datasets. We
propose Self-Supervised Visuo-Tactile Pretraining (SSVTP), a framework for
learning multi-task visuo-tactile representations in a self-supervised manner
through cross-modal supervision. We design a mechanism that enables a robot to
autonomously collect precisely spatially-aligned visual and tactile image
pairs, then train visual and tactile encoders to embed these pairs into a
shared latent space using cross-modal contrastive loss. We apply this latent
space to downstream perception and control of deformable garments on flat
surfaces, and evaluate the flexibility of the learned representations without
fine-tuning on 5 tasks: feature classification, contact localization, anomaly
detection, feature search from a visual query (e.g., garment feature
localization under occlusion), and edge following along cloth edges. The
pretrained representations achieve a 73-100% success rate on these 5 tasks.Comment: RSS 2023, site: https://sites.google.com/berkeley.edu/ssvt
3D Shape Perception from Monocular Vision, Touch, and Shape Priors
Perceiving accurate 3D object shape is important for robots to interact with
the physical world. Current research along this direction has been primarily
relying on visual observations. Vision, however useful, has inherent
limitations due to occlusions and the 2D-3D ambiguities, especially for
perception with a monocular camera. In contrast, touch gets precise local shape
information, though its efficiency for reconstructing the entire shape could be
low. In this paper, we propose a novel paradigm that efficiently perceives
accurate 3D object shape by incorporating visual and tactile observations, as
well as prior knowledge of common object shapes learned from large-scale shape
repositories. We use vision first, applying neural networks with learned shape
priors to predict an object's 3D shape from a single-view color image. We then
use tactile sensing to refine the shape; the robot actively touches the object
regions where the visual prediction has high uncertainty. Our method
efficiently builds the 3D shape of common objects from a color image and a
small number of tactile explorations (around 10). Our setup is easy to apply
and has potentials to help robots better perform grasping or manipulation tasks
on real-world objects.Comment: IROS 2018. The first two authors contributed equally to this wor
Multimodal imagery in music: Active ingredients and mechanisms underlying musical engagement
Clinicians and researchers have provided strong evidence for the efficacy of Guided Imagery and Music (GIM) and similar therapies across a wide range of clinical conditions. What is still lacking is a theoretical framework that would allow identification of the ‘active ingredients’ in this process. This paper seeks to introduce a new systemic framework for investigating such therapies by examining the biological roots as well as the role of music in the regulation of individual and social life to maintain homeostasis via multimodality by means of arousal, imagery, attentional engagement, emotion, memory and analogous processes. Taking the work of Edelman, Damasio and other leaders of modern neuroscience as a point of departure, homeostasis and multimodality are presented as essential not only to the human life process in terms of our active mental life but also to the fullness of Edelman's "primary consciousness" and Damasio's "core self." The implications of these intricate cross-connections are considered as well as the unique propensity for music to spontaneously and multimodally engage these connections. Proposals to evaluate these ideas and stimulate further research in both basic science and clinical practice are made
Touch and Go: Learning from Human-Collected Vision and Touch
The ability to associate touch with sight is essential for tasks that require
physically interacting with objects in the world. We propose a dataset with
paired visual and tactile data called Touch and Go, in which human data
collectors probe objects in natural environments using tactile sensors, while
simultaneously recording egocentric video. In contrast to previous efforts,
which have largely been confined to lab settings or simulated environments, our
dataset spans a large number of "in the wild" objects and scenes. To
demonstrate our dataset's effectiveness, we successfully apply it to a variety
of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven
image stylization, i.e., making the visual appearance of an object more
consistent with a given tactile signal, and 3) predicting future frames of a
tactile signal from visuo-tactile inputs.Comment: Accepted by NeurIPS 2022 Track of Datasets and Benchmark
MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things
The Internet of Things (IoT), the network integrating billions of smart
physical devices embedded with sensors, software, and communication
technologies for the purpose of connecting and exchanging data with other
devices and systems, is a critical and rapidly expanding component of our
modern world. The IoT ecosystem provides a rich source of real-world modalities
such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio
for prediction tasks involving the pose, gaze, activities, and gestures of
humans as well as the touch, contact, pose, 3D of physical objects. Machine
learning presents a rich opportunity to automatically process IoT data at
scale, enabling efficient inference for impact in understanding human
wellbeing, controlling physical devices, and interconnecting smart cities. To
develop machine learning technologies for IoT, this paper proposes MultiIoT,
the most expansive IoT benchmark to date, encompassing over 1.15 million
samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges
involving (1) learning from many sensory modalities, (2) fine-grained
interactions across long temporal ranges, and (3) extreme heterogeneity due to
unique structure and noise topologies in real-world sensors. We also release a
set of strong modeling baselines, spanning modality and task-specific methods
to multisensory and multitask models to encourage future research in
multisensory representation learning for IoT
- …