46,435 research outputs found
Resetting the Optimizer in Deep RL: An Empirical Study
We focus on the task of approximating the optimal value function in deep
reinforcement learning. This iterative process is comprised of solving a
sequence of optimization problems where the loss function changes per
iteration. The common approach to solving this sequence of problems is to
employ modern variants of the stochastic gradient descent algorithm such as
Adam. These optimizers maintain their own internal parameters such as estimates
of the first-order and the second-order moments of the gradient, and update
them over time. Therefore, information obtained in previous iterations is used
to solve the optimization problem in the current iteration. We demonstrate that
this can contaminate the moment estimates because the optimization landscape
can change arbitrarily from one iteration to the next one. To hedge against
this negative effect, a simple idea is to reset the internal parameters of the
optimizer when starting a new iteration. We empirically investigate this
resetting idea by employing various optimizers in conjunction with the Rainbow
algorithm. We demonstrate that this simple modification significantly improves
the performance of deep RL on the Atari benchmark.Comment: Accepted at Thirty-seventh Conference on Neural Information
Processing Systems (NeurIPS 2023
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
We consider the problem of control in the setting of reinforcement learning
(RL), where model information is not available. Policy gradient algorithms are
a popular solution approach for this problem and are usually shown to converge
to a stationary point of the value function. In this paper, we propose two
policy Newton algorithms that incorporate cubic regularization. Both algorithms
employ the likelihood ratio method to form estimates of the gradient and
Hessian of the value function using sample trajectories. The first algorithm
requires an exact solution of the cubic regularized problem in each iteration,
while the second algorithm employs an efficient gradient descent-based
approximation to the cubic regularized problem. We establish convergence of our
proposed algorithms to a second-order stationary point (SOSP) of the value
function, which results in the avoidance of traps in the form of saddle points.
In particular, the sample complexity of our algorithms to find an
-SOSP is , which is an improvement over the
state-of-the-art sample complexity of
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
A vision-guided parallel parking system for a mobile robot using approximate policy iteration
Reinforcement Learning (RL) methods enable autonomous robots to learn skills from scratch by interacting with the environment. However, reinforcement learning can be very time consuming. This paper focuses on accelerating the reinforcement learning process on a mobile robot in an unknown environment. The presented algorithm is based on approximate policy iteration with a continuous state space and a fixed number of actions. The action-value function is represented by a weighted combination of basis functions.
Furthermore, a complexity analysis is provided to show that the implemented approach is guaranteed to converge on an optimal policy with less computational time.
A parallel parking task is selected for testing purposes. In the experiments, the efficiency of the proposed approach is demonstrated and analyzed through a set of simulated and real robot experiments, with comparison drawn from two well known algorithms (Dyna-Q and Q-learning)
Learning Agent for a Heat-Pump Thermostat With a Set-Back Strategy Using Model-Free Reinforcement Learning
The conventional control paradigm for a heat pump with a less efficient
auxiliary heating element is to keep its temperature set point constant during
the day. This constant temperature set point ensures that the heat pump
operates in its more efficient heat-pump mode and minimizes the risk of
activating the less efficient auxiliary heating element. As an alternative to a
constant set-point strategy, this paper proposes a learning agent for a
thermostat with a set-back strategy. This set-back strategy relaxes the
set-point temperature during convenient moments, e.g. when the occupants are
not at home. Finding an optimal set-back strategy requires solving a sequential
decision-making process under uncertainty, which presents two challenges. A
first challenge is that for most residential buildings a description of the
thermal characteristics of the building is unavailable and challenging to
obtain. A second challenge is that the relevant information on the state, i.e.
the building envelope, cannot be measured by the learning agent. In order to
overcome these two challenges, our paper proposes an auto-encoder coupled with
a batch reinforcement learning technique. The proposed approach is validated
for two building types with different thermal characteristics for heating in
the winter and cooling in the summer. The simulation results indicate that the
proposed learning agent can reduce the energy consumption by 4-9% during 100
winter days and by 9-11% during 80 summer days compared to the conventional
constant set-point strategyComment: Submitted to Energies - MDPI.co
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural Code
The current trend in next-generation exascale systems goes towards
integrating a wide range of specialized (co-)processors into traditional
supercomputers. However, the integration of different specialized devices
increases the degree of heterogeneity and the complexity in programming such
type of systems. Due to the efficiency of heterogeneous systems in terms of
Watt and FLOPS per surface unit, opening the access of heterogeneous platforms
to a wider range of users is an important problem to be tackled. In order to
bridge the gap between heterogeneous systems and programmers, in this paper we
propose a machine learning-based approach to learn heuristics for defining
transformation strategies of a program transformation system. Our approach
proposes a novel combination of reinforcement learning and classification
methods to efficiently tackle the problems inherent to this type of systems.
Preliminary results demonstrate the suitability of the approach for easing the
programmability of heterogeneous systems.Comment: Part of the Program Transformation for Programmability in
Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March
2016, 9 pages, LaTe
- ā¦