33 research outputs found
Pouring by Feel: An Analysis of Tactile and Proprioceptive Sensing for Accurate Pouring
As service robots begin to be deployed to assist humans, it is important for
them to be able to perform a skill as ubiquitous as pouring. Specifically, we
focus on the task of pouring an exact amount of water without any environmental
instrumentation, that is, using only the robot's own sensors to perform this
task in a general way robustly. In our approach we use a simple PID controller
which uses the measured change in weight of the held container to supervise the
pour. Unlike previous methods which use specialized force-torque sensors at the
robot wrist, we use our robot joint torque sensors and investigate the added
benefit of tactile sensors at the fingertips. We train three estimators from
data which regress the poured weight out of the source container and show that
we can accurately pour within 10 ml of the target on average while being robust
enough to pour at novel locations and with different grasps on the source
container
MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception
A holistic understanding of object properties across diverse sensory
modalities (e.g., visual, audio, and haptic) is essential for tasks ranging
from object categorization to complex manipulation. Drawing inspiration from
cognitive science studies that emphasize the significance of multi-sensory
integration in human perception, we introduce MOSAIC (Multimodal Object
property learning with Self-Attention and Interactive Comprehension), a novel
framework designed to facilitate the learning of unified multi-sensory object
property representations. While it is undeniable that visual information plays
a prominent role, we acknowledge that many fundamental object properties extend
beyond the visual domain to encompass attributes like texture, mass
distribution, or sounds, which significantly influence how we interact with
objects. In MOSAIC, we leverage this profound insight by distilling knowledge
from multimodal foundation models and aligning these representations not only
across vision but also haptic and auditory sensory modalities. Through
extensive experiments on a dataset where a humanoid robot interacts with 100
objects across 10 exploratory behaviors, we demonstrate the versatility of
MOSAIC in two task families: object categorization and object-fetching tasks.
Our results underscore the efficacy of MOSAIC's unified representations,
showing competitive performance in category recognition through a simple linear
probe setup and excelling in the fetch object task under zero-shot transfer
conditions. This work pioneers the application of sensory grounding in
foundation models for robotics, promising a significant leap in multi-sensory
perception capabilities for autonomous systems. We have released the code,
datasets, and additional results: https://github.com/gtatiya/MOSAIC.Comment: Accepted to the 2024 IEEE International Conference on Robotics and
Automation (ICRA), May 13 to 17, 2024; Yokohama, Japa
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Sound is one of the most informative and abundant modalities in the real
world while being robust to sense without contacts by small and cheap sensors
that can be placed on mobile devices. Although deep learning is capable of
extracting information from multiple sensory inputs, there has been little use
of sound for the control and learning of robotic actions. For unsupervised
reinforcement learning, an agent is expected to actively collect experiences
and jointly learn representations and policies in a self-supervised way. We
build realistic robotic manipulation scenarios with physics-based sound
simulation and propose the Intrinsic Sound Curiosity Module (ISCM). The ISCM
provides feedback to a reinforcement learner to learn robust representations
and to reward a more efficient exploration behavior. We perform experiments
with sound enabled during pre-training and disabled during adaptation, and show
that representations learned by ISCM outperform the ones by vision-only
baselines and pre-trained policies can accelerate the learning process when
applied to downstream tasks.Comment: Accepted at IROS 202
Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-robot Handovers
Human-robot object handover is a key skill for the future of human-robot
collaboration. CORSMAL 2020 Challenge focuses on the perception part of this
problem: the robot needs to estimate the filling mass of a container held by a
human. Although there are powerful methods in image processing and audio
processing individually, answering such a problem requires processing data from
multiple sensors together. The appearance of the container, the sound of the
filling, and the depth data provide essential information. We propose a
multi-modal method to predict three key indicators of the filling mass: filling
type, filling level, and container capacity. These indicators are then combined
to estimate the filling mass of a container. Our method obtained Top-1 overall
performance among all submissions to CORSMAL 2020 Challenge on both public and
private subsets while showing no evidence of overfitting. Our source code is
publicly available: https://github.com/v-iashin/CORSMALComment: Code: https://github.com/v-iashin/CORSMAL Docker:
https://hub.docker.com/r/iashin/corsma
The CORSMAL benchmark for the prediction of the properties of containers
13 pages, 6 tables, 7 figures, Pre-print submitted to IEEE AccessAuthors' post-print accepted for publication in IEEE Access, see https://doi.org/10.1109/ACCESS.2022.3166906 . 14 pages, 6 tables, 7 figuresThe contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and estimating the filling mass with audio-visual multi-stage approaches reach up to 65% weighted average capacity and mass scores. These results show that there is still room for improvement on the design of new methods. These new methods can be ranked and compared on the individual leaderboards provided by our open framework
Haptics: Science, Technology, Applications
This open access book constitutes the proceedings of the 13th International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, EuroHaptics 2022, held in Hamburg, Germany, in May 2022. The 36 regular papers included in this book were carefully reviewed and selected from 129 submissions. They were organized in topical sections as follows: haptic science; haptic technology; and haptic applications
Haptics: Science, Technology, Applications
This open access book constitutes the proceedings of the 12th International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, EuroHaptics 2020, held in Leiden, The Netherlands, in September 2020. The 60 papers presented in this volume were carefully reviewed and selected from 111 submissions. The were organized in topical sections on haptic science, haptic technology, and haptic applications. This year's focus is on accessibility
Virtual Reality Games for Motor Rehabilitation
This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion
Haptics: Science, Technology, Applications
This open access book constitutes the proceedings of the 12th International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, EuroHaptics 2020, held in Leiden, The Netherlands, in September 2020. The 60 papers presented in this volume were carefully reviewed and selected from 111 submissions. The were organized in topical sections on haptic science, haptic technology, and haptic applications. This year's focus is on accessibility