26,469 research outputs found
Combining Subgoal Graphs with Reinforcement Learning to Build a Rational Pathfinder
In this paper, we present a hierarchical path planning framework called SG-RL
(subgoal graphs-reinforcement learning), to plan rational paths for agents
maneuvering in continuous and uncertain environments. By "rational", we mean
(1) efficient path planning to eliminate first-move lags; (2) collision-free
and smooth for agents with kinematic constraints satisfied. SG-RL works in a
two-level manner. At the first level, SG-RL uses a geometric path-planning
method, i.e., Simple Subgoal Graphs (SSG), to efficiently find optimal abstract
paths, also called subgoal sequences. At the second level, SG-RL uses an RL
method, i.e., Least-Squares Policy Iteration (LSPI), to learn near-optimal
motion-planning policies which can generate kinematically feasible and
collision-free trajectories between adjacent subgoals. The first advantage of
the proposed method is that SSG can solve the limitations of sparse reward and
local minima trap for RL agents; thus, LSPI can be used to generate paths in
complex environments. The second advantage is that, when the environment
changes slightly (i.e., unexpected obstacles appearing), SG-RL does not need to
reconstruct subgoal graphs and replan subgoal sequences using SSG, since LSPI
can deal with uncertainties by exploiting its generalization ability to handle
changes in environments. Simulation experiments in representative scenarios
demonstrate that, compared with existing methods, SG-RL can work well on
large-scale maps with relatively low action-switching frequencies and shorter
path lengths, and SG-RL can deal with small changes in environments. We further
demonstrate that the design of reward functions and the types of training
environments are important factors for learning feasible policies.Comment: 20 page
Game-theoretical control with continuous action sets
Motivated by the recent applications of game-theoretical learning techniques
to the design of distributed control systems, we study a class of control
problems that can be formulated as potential games with continuous action sets,
and we propose an actor-critic reinforcement learning algorithm that provably
converges to equilibrium in this class of problems. The method employed is to
analyse the learning process under study through a mean-field dynamical system
that evolves in an infinite-dimensional function space (the space of
probability distributions over the players' continuous controls). To do so, we
extend the theory of finite-dimensional two-timescale stochastic approximation
to an infinite-dimensional, Banach space setting, and we prove that the
continuous dynamics of the process converge to equilibrium in the case of
potential games. These results combine to give a provably-convergent learning
algorithm in which players do not need to keep track of the controls selected
by the other agents.Comment: 19 page
Reinforcement Learning in Different Phases of Quantum Control
The ability to prepare a physical system in a desired quantum state is
central to many areas of physics such as nuclear magnetic resonance, cold
atoms, and quantum computing. Yet, preparing states quickly and with high
fidelity remains a formidable challenge. In this work we implement cutting-edge
Reinforcement Learning (RL) techniques and show that their performance is
comparable to optimal control methods in the task of finding short,
high-fidelity driving protocol from an initial to a target state in
non-integrable many-body quantum systems of interacting qubits. RL methods
learn about the underlying physical system solely through a single scalar
reward (the fidelity of the resulting state) calculated from numerical
simulations of the physical system. We further show that quantum state
manipulation, viewed as an optimization problem, exhibits a spin-glass-like
phase transition in the space of protocols as a function of the protocol
duration. Our RL-aided approach helps identify variational protocols with
nearly optimal fidelity, even in the glassy phase, where optimal state
manipulation is exponentially hard. This study highlights the potential
usefulness of RL for applications in out-of-equilibrium quantum physics.Comment: A legend for the videos referred to in the paper is available on
https://mgbukov.github.io/RL_movies
Manifold Representations for Continuous-State Reinforcement Learning
Reinforcement learning (RL) has shown itself to be an effective paradigm for solving optimal control problems with a finite number of states. Generalizing RL techniques to problems with a continuous state space has proven a difficult task. We present an approach to modeling the RL value function using a manifold representation. By explicitly modeling the topology of the value function domain, traditional problems with discontinuities and resolution can be addressed without resorting to complex function approximators. We describe how manifold techniques can be applied to value-function approximation, and present methods for constructing manifold representations in both batch and online settings. We present empirical results demonstrating the effectiveness of our approach
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
- …