6,335 research outputs found
Potential-Based Shaping and Q-Value Initialization are Equivalent
Shaping has proven to be a powerful but precarious means of improving
reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the
potential-based shaping algorithm for adding shaping rewards in a way that
guarantees the learner will learn optimal behavior. In this note, we prove
certain similarities between this shaping algorithm and the initialization step
required for several reinforcement learning algorithms. More specifically, we
prove that a reinforcement learner with initial Q-values based on the shaping
algorithm's potential function make the same updates throughout learning as a
learner receiving potential-based shaping rewards. We further prove that under
a broad category of policies, the behavior of these two learners are
indistinguishable. The comparison provides intuition on the theoretical
properties of the shaping algorithm as well as a suggestion for a simpler
method for capturing the algorithm's benefit. In addition, the equivalence
raises previously unaddressed issues concerning the efficiency of learning with
potential-based shaping
Automating Vehicles by Deep Reinforcement Learning using Task Separation with Hill Climbing
Within the context of autonomous driving a model-based reinforcement learning
algorithm is proposed for the design of neural network-parameterized
controllers. Classical model-based control methods, which include sampling- and
lattice-based algorithms and model predictive control, suffer from the
trade-off between model complexity and computational burden required for the
online solution of expensive optimization or search problems at every short
sampling time. To circumvent this trade-off, a 2-step procedure is motivated:
first learning of a controller during offline training based on an arbitrarily
complicated mathematical system model, before online fast feedforward
evaluation of the trained controller. The contribution of this paper is the
proposition of a simple gradient-free and model-based algorithm for deep
reinforcement learning using task separation with hill climbing (TSHC). In
particular, (i) simultaneous training on separate deterministic tasks with the
purpose of encoding many motion primitives in a neural network, and (ii) the
employment of maximally sparse rewards in combination with virtual velocity
constraints (VVCs) in setpoint proximity are advocated.Comment: 10 pages, 6 figures, 1 tabl
Controlled Lagrangians and Potential Shaping for Stabilization of Discrete Mechanical Systems
The method of controlled Lagrangians for discrete mechanical systems is
extended to include potential shaping in order to achieve complete state-space
asymptotic stabilization. New terms in the controlled shape equation that are
necessary for matching in the discrete context are introduced. The theory is
illustrated with the problem of stabilization of the cart-pendulum system on an
incline. We also discuss digital and model predictive control.Comment: IEEE Conference on Decision and Control, 2006 6 pages, 4 figure
- …