2 research outputs found
PICO: Primitive Imitation for COntrol
In this work, we explore a novel framework for control of complex systems
called Primitive Imitation for Control PICO. The approach combines ideas from
imitation learning, task decomposition, and novel task sequencing to generalize
from demonstrations to new behaviors. Demonstrations are automatically
decomposed into existing or missing sub-behaviors which allows the framework to
identify novel behaviors while not duplicating existing behaviors.
Generalization to new tasks is achieved through dynamic blending of behavior
primitives. We evaluated the approach using demonstrations from two different
robotic platforms. The experimental results show that PICO is able to detect
the presence of a novel behavior primitive and build the missing control
policy
Accelerated Robot Learning via Human Brain Signals
In reinforcement learning (RL), sparse rewards are a natural way to specify
the task to be learned. However, most RL algorithms struggle to learn in this
setting since the learning signal is mostly zeros. In contrast, humans are good
at assessing and predicting the future consequences of actions and can serve as
good reward/policy shapers to accelerate the robot learning process. Previous
works have shown that the human brain generates an error-related signal,
measurable using electroencephelography (EEG), when the human perceives the
task being done erroneously. In this work, we propose a method that uses
evaluative feedback obtained from human brain signals measured via scalp EEG to
accelerate RL for robotic agents in sparse reward settings. As the robot learns
the task, the EEG of a human observer watching the robot attempts is recorded
and decoded into noisy error feedback signal. From this feedback, we use
supervised learning to obtain a policy that subsequently augments the behavior
policy and guides exploration in the early stages of RL. This bootstraps the RL
learning process to enable learning from sparse reward. Using a robotic
navigation task as a test bed, we show that our method achieves a stable
obstacle-avoidance policy with high success rate, outperforming learning from
sparse rewards only that struggles to achieve obstacle avoidance behavior or
fails to advance to the goal.Comment: 2020 IEEE International Conference on Robotics and Automation - ICRA
202