2 research outputs found
Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems
In this paper we consider infinite horizon discounted dynamic programming
problems with finite state and control spaces, and partial state observations.
We discuss an algorithm that uses multistep lookahead, truncated rollout with a
known base policy, and a terminal cost function approximation. This algorithm
is also used for policy improvement in an approximate policy iteration scheme,
where successive policies are approximated by using a neural network
classifier. A novel feature of our approach is that it is well suited for
distributed computation through an extended belief space formulation and the
use of a partitioned architecture, which is trained with multiple neural
networks. We apply our methods in simulation to a class of sequential repair
problems where a robot inspects and repairs a pipeline with potentially several
rupture sites under partial information about the state of the pipeline.Comment: Total 9 pages, 9 figures, 1 table, submitted and accepted to be
published in IEEE RA-L 202
Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems
In this paper we consider infinite horizon discounted dynamic programming
problems with finite state and control spaces, partial state observations, and
a multiagent structure. We discuss and compare algorithms that simultaneously
or sequentially optimize the agents' controls by using multistep lookahead,
truncated rollout with a known base policy, and a terminal cost function
approximation. Our methods specifically address the computational challenges of
partially observable multiagent problems. In particular: 1) We consider rollout
algorithms that dramatically reduce required computation while preserving the
key cost improvement property of the standard rollout method. The per-step
computational requirements for our methods are on the order of as
compared with for standard rollout, where is the maximum
cardinality of the constraint set for the control component of each agent, and
is the number of agents. 2) We show that our methods can be applied to
challenging problems with a graph structure, including a class of robot repair
problems whereby multiple robots collaboratively inspect and repair a system
under partial information. 3) We provide a simulation study that compares our
methods with existing methods, and demonstrate that our methods can handle
larger and more complex partially observable multiagent problems (state space
size and control space size , respectively). Finally, we
incorporate our multiagent rollout algorithms as building blocks in an
approximate policy iteration scheme, where successive rollout policies are
approximated by using neural network classifiers. While this scheme requires a
strictly off-line implementation, it works well in our computational
experiments and produces additional significant performance improvement over
the single online rollout iteration method.Comment: 8 pages + 3 pages appendix + 9 figures + 3 tables, accepted in
Conference on Robot Learnin