Grasping small objects surrounded by unstable or non-rigid material plays a
crucial role in applications such as surgery, harvesting, construction,
disaster recovery, and assisted feeding. This task is especially difficult when
fine manipulation is required in the presence of sensor noise and perception
errors; errors inevitably trigger dynamic motion, which is challenging to model
precisely. Circumventing the difficulty to build accurate models for contacts
and dynamics, data-driven methods like reinforcement learning (RL) can optimize
task performance via trial and error, reducing the need for accurate models of
contacts and dynamics. Applying RL methods to real robots, however, has been
hindered by factors such as prohibitively high sample complexity or the high
training infrastructure cost for providing resets on hardware. This work
presents CherryBot, an RL system that uses chopsticks for fine manipulation
that surpasses human reactiveness for some dynamic grasping tasks. By
integrating imprecise simulators, suboptimal demonstrations and external state
estimation, we study how to make a real-world robot learning system sample
efficient and general while reducing the human effort required for supervision.
Our system shows continual improvement through 30 minutes of real-world
interaction: through reactive retry, it achieves an almost 100% success rate on
the demanding task of using chopsticks to grasp small objects swinging in the
air. We demonstrate the reactiveness, robustness and generalizability of
CherryBot to varying object shapes and dynamics (e.g., external disturbances
like wind and human perturbations). Videos are available at
https://goodcherrybot.github.io/