166 research outputs found
Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring
In this paper, we focus on the challenging perception problem in robotic
pouring. Most of the existing approaches either leverage visual or haptic
information. However, these techniques may suffer from poor generalization
performances on opaque containers or concerning measuring precision. To tackle
these drawbacks, we propose to make use of audio vibration sensing and design a
deep neural network PouringNet to predict the liquid height from the audio
fragment during the robotic pouring task. PouringNet is trained on our
collected real-world pouring dataset with multimodal sensing data, which
contains more than 3000 recordings of audio, force feedback, video and
trajectory data of the human hand that performs the pouring task. Each record
represents a complete pouring procedure. We conduct several evaluations on
PouringNet with our dataset and robotic hardware. The results demonstrate that
our PouringNet generalizes well across different liquid containers, positions
of the audio receiver, initial liquid heights and types of liquid, and
facilitates a more robust and accurate audio-based perception for robotic
pouring.Comment: Checkout project page for video, code and dataset:
https://lianghongzhuo.github.io/AudioPourin
PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring
Liquid perception is critical for robotic pouring tasks. It usually requires
the robust visual detection of flowing liquid. However, while recent works have
shown promising results in liquid perception, they typically require labeled
data for model training, a process that is both time-consuming and reliant on
human labor. To this end, this paper proposes a simple yet effective framework
PourIt!, to serve as a tool for robotic pouring tasks. We design a simple data
collection pipeline that only needs image-level labels to reduce the reliance
on tedious pixel-wise annotations. Then, a binary classification model is
trained to generate Class Activation Map (CAM) that focuses on the visual
difference between these two kinds of collected data, i.e., the existence of
liquid drop or not. We also devise a feature contrast strategy to improve the
quality of the CAM, thus entirely and tightly covering the actual liquid
regions. Then, the container pose is further utilized to facilitate the 3D
point cloud recovery of the detected liquid region. Finally, the
liquid-to-container distance is calculated for visual closed-loop control of
the physical robot. To validate the effectiveness of our proposed method, we
also contribute a novel dataset for our task and name it PourIt! dataset.
Extensive results on this dataset and physical Franka robot have shown the
utility and effectiveness of our method in the robotic pouring tasks. Our
dataset, code and pre-trained models will be available on the project page.Comment: ICCV202
Estimating Properties of Solid Particles Inside Container Using Touch Sensing
Solid particles, such as rice and coffee beans, are commonly stored in
containers and are ubiquitous in our daily lives. Understanding those
particles' properties could help us make later decisions or perform later
manipulation tasks such as pouring. Humans typically interact with the
containers to get an understanding of the particles inside them, but it is
still a challenge for robots to achieve that. This work utilizes tactile
sensing to estimate multiple properties of solid particles enclosed in the
container, specifically, content mass, content volume, particle size, and
particle shape. We design a sequence of robot actions to interact with the
container. Based on physical understanding, we extract static force/torque
value from the F/T sensor, vibration-related features and topple-related
features from the newly designed high-speed GelSight tactile sensor to estimate
those four particle properties. We test our method on very different daily
particles, including powder, rice, beans, tablets, etc. Experiments show that
our approach is able to estimate content mass with an error of g, content
volume with an error of ml, particle size with an error of mm, and
achieves an accuracy of % for particle shape estimation. In addition, our
method can generalize to unseen particles with unknown volumes. By estimating
these particle properties, our method can help robots to better perceive the
granular media and help with different manipulation tasks in daily life and
industry.Comment: 8 pages, 14 figure
Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-robot Handovers
Human-robot object handover is a key skill for the future of human-robot
collaboration. CORSMAL 2020 Challenge focuses on the perception part of this
problem: the robot needs to estimate the filling mass of a container held by a
human. Although there are powerful methods in image processing and audio
processing individually, answering such a problem requires processing data from
multiple sensors together. The appearance of the container, the sound of the
filling, and the depth data provide essential information. We propose a
multi-modal method to predict three key indicators of the filling mass: filling
type, filling level, and container capacity. These indicators are then combined
to estimate the filling mass of a container. Our method obtained Top-1 overall
performance among all submissions to CORSMAL 2020 Challenge on both public and
private subsets while showing no evidence of overfitting. Our source code is
publicly available: https://github.com/v-iashin/CORSMALComment: Code: https://github.com/v-iashin/CORSMAL Docker:
https://hub.docker.com/r/iashin/corsma
- …