8 research outputs found
Review of the techniques used in motorâcognitive humanârobot skill transfer
Abstract A conventional robot programming method extensively limits the reusability of skills in the developmental aspect. Engineers programme a robot in a targeted manner for the realisation of predefined skills. The low reusability of generalâpurpose robot skills is mainly reflected in inability in novel and complex scenarios. Skill transfer aims to transfer human skills to generalâpurpose manipulators or mobile robots to replicate humanâlike behaviours. Skill transfer methods that are commonly used at present, such as learning from demonstrated (LfD) or imitation learning, endow the robot with the expert's lowâlevel motor and highâlevel decisionâmaking ability, so that skills can be reproduced and generalised according to perceived context. The improvement of robot cognition usually relates to an improvement in the autonomous highâlevel decisionâmaking ability. Based on the idea of establishing a generic or specialised robot skill library, robots are expected to autonomously reason about the needs for using skills and plan compound movements according to sensory input. In recent years, in this area, many successful studies have demonstrated their effectiveness. Herein, a detailed review is provided on the transferring techniques of skills, applications, advancements, and limitations, especially in the LfD. Future research directions are also suggested
Inverse reinforcement learning from failure
Inverse reinforcement learning (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, failed demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL methods cannot make use of failed demonstrations, in this paper we propose inverse reinforcement learning from failure (IRLF) which exploits both successful and failed demonstrations. Starting from the state-of-the-art maximum causal entropy IRL method, we propose a new constrained optimisation formulation that accommodates both types of demonstrations while remaining convex. We then derive update rules for learning reward functions and policies. Experiments on both simulated and real-robot data demonstrate that IRLF converges faster and generalises better than maximum causal entropy IRL, especially when few successful demonstrations are available
Rapidly exploring learning trees
Inverse Reinforcement Learning (IRL) for path
planning enables robots to learn cost functions for difficult tasks
from demonstration, instead of hard-coding them. However,
IRL methods face practical limitations that stem from the need
to repeat costly planning procedures. In this paper, we propose
Rapidly Exploring Learning Trees (RLTâ
), which learns the cost
functions of Optimal Rapidly Exploring Random Trees (RRTâ
)
from demonstration, thereby making inverse learning methods
applicable to more complex tasks. Our approach extends
Maximum Margin Planning to work with RRTâ
cost functions.
Furthermore, we propose a caching scheme that greatly reduces
the computational cost of this approach. Experimental results
on simulated and real-robot data from a social navigation
scenario show that RLTâ
achieves better performance at lower
computational cost than existing methods. We also successfully
deploy control policies learned with RLTâ
on a real telepresence
robot
Inverse reinforcement learning from failure
Inverse reinforcement learning (IRL) allows autonomous agents
to learn to solve complex tasks from successful demonstrations.
However, in many settings, e.g., when a human
learns the task by trial and error, failed demonstrations are
also readily available. In addition, in some tasks, purposely
generating failed demonstrations may be easier than generating
successful ones. Since existing IRL methods cannot
make use of failed demonstrations, in this paper we propose
inverse reinforcement learning from failure (IRLF) which
exploits both successful and failed demonstrations. Starting
from the state-of-the-art maximum causal entropy IRL
method, we propose a new constrained optimisation formulation
that accommodates both types of demonstrations
while remaining convex. We then derive update rules for
learning reward functions and policies. Experiments on both
simulated and real-robot data demonstrate that IRLF converges
faster and generalises better than maximum causal
entropy IRL, especially when few successful demonstrations
are available
VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agentâs uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods
TERESA: A Socially Intelligent Semi-autonomous Telepresence System
TERESA is a socially intelligent semi-autonomous telepresence system that is currently being developed as part of an FP7-STREP project funded by the European Union. The ultimate goal of the project is to deploy this system in an elderly day centre to allow elderly people to participate in social events even when they are unable to travel to the centre. In this paper, we present an overview of our progress on TERESA. We discuss the most significant scientific and technical challenges including: understanding and automati-cally recognizing social behaviour; defining social norms for the interaction between a telepresence robot and its users; navigating the environment while taking into account social features and constraints; and learning to estimate the social impact of the robotâs actions from multiple sources of feedback. We report on our current progress on each of these chal-lenges, as well as our plans for future work