9,126 research outputs found
Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning
Reinforcement learning is a promising paradigm for learning robot control,
allowing complex control policies to be learned without requiring a dynamics
model. However, even state of the art algorithms can be difficult to tune for
optimum performance. We propose employing an ensemble of multiple reinforcement
learning agents, each with a different set of hyperparameters, along with a
mechanism for choosing the best performing set(s) on-line. In the literature,
the ensemble technique is used to improve performance in general, but the
current work specifically addresses decreasing the hyperparameter tuning
effort. Furthermore, our approach targets on-line learning on a single robotic
system, and does not require running multiple simulators in parallel. Although
the idea is generic, the Deep Deterministic Policy Gradient was the model
chosen, being a representative deep learning actor-critic method with good
performance in continuous action settings but known high variance. We compare
our online weighted q-ensemble approach to q-average ensemble strategies
addressed in literature using alternate policy training, as well as online
training, demonstrating the advantage of the new approach in eliminating
hyperparameter tuning. The applicability to real-world systems was validated in
common robotic benchmark environments: the bipedal robot half cheetah and the
swimmer. Online Weighted Q-Ensemble presented overall lower variance and
superior results when compared with q-average ensembles using randomized
parameterizations
Local Navigation Among Movable Obstacles with Deep Reinforcement Learning
Autonomous robots would benefit a lot by gaining the ability to manipulate
their environment to solve path planning tasks, known as the Navigation Among
Movable Obstacle (NAMO) problem. In this paper, we present a deep reinforcement
learning approach for solving NAMO locally, near narrow passages. We train
parallel agents in physics simulation using an Advantage Actor-Critic based
algorithm with a multi-modal neural network. We present an online policy that
is able to push obstacles in a non-axial-aligned fashion, react to unexpected
obstacle dynamics in real-time, and solve the local NAMO problem. Experimental
validation in simulation shows that the presented approach generalises to
unseen NAMO problems in unknown environments. We further demonstrate the
implementation of the policy on a real quadrupedal robot, showing that the
policy can deal with real-world sensor noises and uncertainties in unseen NAMO
tasks.Comment: 7 pages, 7 figures, 4 table
- …