136,859 research outputs found
Online algorithms for POMDPs with continuous state, action, and observation spaces
Online solvers for partially observable Markov decision processes have been
applied to problems with large discrete state spaces, but continuous state,
action, and observation spaces remain a challenge. This paper begins by
investigating double progressive widening (DPW) as a solution to this
challenge. However, we prove that this modification alone is not sufficient
because the belief representations in the search tree collapse to a single
particle causing the algorithm to converge to a policy that is suboptimal
regardless of the computation time. This paper proposes and evaluates two new
algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using
weighted particle filtering. Simulation results show that these modifications
allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Decentralized Learning for Optimality in Stochastic Dynamic Teams and Games with Local Control and Global State Information
Stochastic dynamic teams and games are rich models for decentralized systems
and challenging testing grounds for multi-agent learning. Previous work that
guaranteed team optimality assumed stateless dynamics, or an explicit
coordination mechanism, or joint-control sharing. In this paper, we present an
algorithm with guarantees of convergence to team optimal policies in teams and
common interest games. The algorithm is a two-timescale method that uses a
variant of Q-learning on the finer timescale to perform policy evaluation while
exploring the policy space on the coarser timescale. Agents following this
algorithm are "independent learners": they use only local controls, local cost
realizations, and global state information, without access to controls of other
agents. The results presented here are the first, to our knowledge, to give
formal guarantees of convergence to team optimality using independent learners
in stochastic dynamic teams and common interest games
- …
