5,518 research outputs found
Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimality-preserving operators on Q-functions. We
first describe an operator for tabular representations, the consistent Bellman
operator, which incorporates a notion of local policy consistency. We show that
this local consistency leads to an increase in the action gap at each state;
increasing this gap, we argue, mitigates the undesirable effects of
approximation and estimation errors on the induced greedy policies. This
operator can also be applied to discretized continuous space and time problems,
and we provide empirical results evidencing superior performance in this
context. Extending the idea of a locally consistent operator, we then derive
sufficient conditions for an operator to preserve optimality, leading to a
family of operators which includes our consistent Bellman operator. As
corollaries we provide a proof of optimality for Baird's advantage learning
algorithm and derive other gap-increasing operators with interesting
properties. We conclude with an empirical study on 60 Atari 2600 games
illustrating the strong potential of these new operators
A Benchmark Environment Motivated by Industrial Control Problems
In the research area of reinforcement learning (RL), frequently novel and
promising methods are developed and introduced to the RL community. However,
although many researchers are keen to apply their methods on real-world
problems, implementing such methods in real industry environments often is a
frustrating and tedious process. Generally, academic research groups have only
limited access to real industrial data and applications. For this reason, new
methods are usually developed, evaluated and compared by using artificial
software benchmarks. On one hand, these benchmarks are designed to provide
interpretable RL training scenarios and detailed insight into the learning
process of the method on hand. On the other hand, they usually do not share
much similarity with industrial real-world applications. For this reason we
used our industry experience to design a benchmark which bridges the gap
between freely available, documented, and motivated artificial benchmarks and
properties of real industrial problems. The resulting industrial benchmark (IB)
has been made publicly available to the RL community by publishing its Java and
Python code, including an OpenAI Gym wrapper, on Github. In this paper we
motivate and describe in detail the IB's dynamics and identify prototypic
experimental settings that capture common situations in real-world industry
control problems
A Benchmark Environment Motivated by Industrial Control Problems
In the research area of reinforcement learning (RL), frequently novel and
promising methods are developed and introduced to the RL community. However,
although many researchers are keen to apply their methods on real-world
problems, implementing such methods in real industry environments often is a
frustrating and tedious process. Generally, academic research groups have only
limited access to real industrial data and applications. For this reason, new
methods are usually developed, evaluated and compared by using artificial
software benchmarks. On one hand, these benchmarks are designed to provide
interpretable RL training scenarios and detailed insight into the learning
process of the method on hand. On the other hand, they usually do not share
much similarity with industrial real-world applications. For this reason we
used our industry experience to design a benchmark which bridges the gap
between freely available, documented, and motivated artificial benchmarks and
properties of real industrial problems. The resulting industrial benchmark (IB)
has been made publicly available to the RL community by publishing its Java and
Python code, including an OpenAI Gym wrapper, on Github. In this paper we
motivate and describe in detail the IB's dynamics and identify prototypic
experimental settings that capture common situations in real-world industry
control problems
Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing
Within the context of autonomous driving a model-based reinforcement learning
algorithm is proposed for the design of neural network-parameterized
controllers. Classical model-based control methods, which include sampling- and
lattice-based algorithms and model predictive control, suffer from the
trade-off between model complexity and computational burden required for the
online solution of expensive optimization or search problems at every short
sampling time. To circumvent this trade-off, a 2-step procedure is motivated:
first learning of a controller during offline training based on an arbitrarily
complicated mathematical system model, before online fast feedforward
evaluation of the trained controller. The contribution of this paper is the
proposition of a simple gradient-free and model-based algorithm for deep
reinforcement learning using task separation with hill climbing (TSHC). In
particular, (i) simultaneous training on separate deterministic tasks with the
purpose of encoding many motion primitives in a neural network, and (ii) the
employment of maximally sparse rewards in combination with virtual velocity
constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl
Learning from Outside the Viability Kernel: Why we Should Build Robots that can Fall with Grace
Despite impressive results using reinforcement learning to solve complex
problems from scratch, in robotics this has still been largely limited to
model-based learning with very informative reward functions. One of the major
challenges is that the reward landscape often has large patches with no
gradient, making it difficult to sample gradients effectively. We show here
that the robot state-initialization can have a more important effect on the
reward landscape than is generally expected. In particular, we show the
counter-intuitive benefit of including initializations that are unviable, in
other words initializing in states that are doomed to fail.Comment: Proceedings of the 2018 IEEE International Conference on SImulation,
Modeling and Programming for Autonomous Robots (SIMPAR), Brisbane, Australia,
16-19 201
- …