1,394 research outputs found
Recommended from our members
Hierarchical policy design for sample-efficient learning of robot table tennis through self-play
Training robots with physical bodies requires developing new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reasoning about opponents that is difficult to do in a model-based way), model-based prediction of external objects (which are difficult to control directly with analytic control methods, but governed by learnable and relatively simple laws of physics), and analytic controllers for the robot itself. Human demonstrations are used to train dynamics models, which together with the analytic controller allow any robot that is physically capable to play table tennis without training episodes. Using only about 7000 demonstrated trajectories, a striking policy can hit ball targets with about 20 cm error. Self-play is used to train cooperative and adversarial strategies on top of model-based striking skills trained from human demonstrations. After only about 24000 strikes in self-play the agent learns to best exploit the human dynamics models for longer cooperative games. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. Experiments are carried out in a virtual reality environment using sensory observations that are obtainable in the real world. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation.Computer Science
Optimal Stroke Learning with Policy Gradient Approach for Robotic Table Tennis
Learning to play table tennis is a challenging task for robots, as a wide
variety of strokes required. Recent advances have shown that deep Reinforcement
Learning (RL) is able to successfully learn the optimal actions in a simulated
environment. However, the applicability of RL in real scenarios remains limited
due to the high exploration effort. In this work, we propose a realistic
simulation environment in which multiple models are built for the dynamics of
the ball and the kinematics of the robot. Instead of training an end-to-end RL
model, a novel policy gradient approach with TD3 backbone is proposed to learn
the racket strokes based on the predicted state of the ball at the hitting
time. In the experiments, we show that the proposed approach significantly
outperforms the existing RL methods in simulation. Furthermore, to cross the
domain from simulation to reality, we adopt an efficient retraining method and
test it in three real scenarios. The resulting success rate is 98% and the
distance error is around 24.9 cm. The total training time is about 1.5 hours
Sample-efficient Reinforcement Learning in Robotic Table Tennis
Reinforcement learning (RL) has achieved some impressive recent successes in
various computer games and simulations. Most of these successes are based on
having large numbers of episodes from which the agent can learn. In typical
robotic applications, however, the number of feasible attempts is very limited.
In this paper we present a sample-efficient RL algorithm applied to the example
of a table tennis robot. In table tennis every stroke is different, with
varying placement, speed and spin. An accurate return therefore has to be found
depending on a high-dimensional continuous state space. To make learning in few
trials possible the method is embedded into our robot system. In this way we
can use a one-step environment. The state space depends on the ball at hitting
time (position, velocity, spin) and the action is the racket state
(orientation, velocity) at hitting. An actor-critic based deterministic policy
gradient algorithm was developed for accelerated learning. Our approach
performs competitively both in a simulation and on the real robot in a number
of challenging scenarios. Accurate results are obtained without pre-training in
under episodes of training. The video presenting our experiments is
available at https://youtu.be/uRAtdoL6Wpw.Comment: accepted at ICRA 2021 (Xian, China
Stylized Table Tennis Robots Skill Learning with Incomplete Human Demonstrations
In recent years, Reinforcement Learning (RL) is becoming a popular technique
for training controllers for robots. However, for complex dynamic robot control
tasks, RL-based method often produces controllers with unrealistic styles. In
contrast, humans can learn well-stylized skills under supervisions. For
example, people learn table tennis skills by imitating the motions of coaches.
Such reference motions are often incomplete, e.g. without the presence of an
actual ball. Inspired by this, we propose an RL-based algorithm to train a
robot that can learn the playing style from such incomplete human
demonstrations. We collect data through the teaching-and-dragging method. We
also propose data augmentation techniques to enable our robot to adapt to balls
of different velocities. We finally evaluate our policy in different simulators
with varying dynamics.Comment: Submitted to ICRA 202
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Intrinsically motivated spontaneous exploration is a key enabler of
autonomous lifelong learning in human children. It enables the discovery and
acquisition of large repertoires of skills through self-generation,
self-selection, self-ordering and self-experimentation of learning goals. We
present an algorithmic approach called Intrinsically Motivated Goal Exploration
Processes (IMGEP) to enable similar properties of autonomous or self-supervised
learning in machines. The IMGEP algorithmic architecture relies on several
principles: 1) self-generation of goals, generalized as fitness functions; 2)
selection of goals based on intrinsic rewards; 3) exploration with incremental
goal-parameterized policy search and exploitation of the gathered data with a
batch learning algorithm; 4) systematic reuse of information acquired when
targeting a goal for improving towards other goals. We present a particularly
efficient form of IMGEP, called Modular Population-Based IMGEP, that uses a
population-based policy and an object-centered modularity in goals and
mutations. We provide several implementations of this architecture and
demonstrate their ability to automatically generate a learning curriculum
within several experimental setups including a real humanoid robot that can
explore multiple spaces of goals with several hundred continuous dimensions.
While no particular target goal is provided to the system, this curriculum
allows the discovery of skills that act as stepping stone for learning more
complex skills, e.g. nested tool use. We show that learning diverse spaces of
goals with intrinsic motivations is more efficient for learning complex skills
than only trying to directly learn these complex skills
- …