Solving Partially Observable Markov Decision Processes by Optimization Neural Networks

Abstract

Partially Observable Markov Decision Processes (POMDPs) cope with sequential decision processes where an agent tries to maximize some reward without complete knowledge of the process. These models are of interest for quality control, machine maintenance, reinforcement learning, etc. More generally, Monahan [9] has shown that many tasks in partially observable environments can be viewed as POMDPs. A solution for the POMDP gives the best behavior of the agent face to the environment. This gives a solution over all the state space, which is continuous and inside of an integral polytope. The approaches proposed until now use linear programming (LP) to solve the optimization problem in this type of processes. By other side, Neural Networks (NNs) have shown a promising potentiality for finding solutions to optimization problems; particularly, they have been used to solve quadratic 0-1 programming problems [4, 6]. In this paper, we use optimization neural networks as a different way to solve the optimization problem in the POMDP, which allows a parallel hardware implementation

    Similar works

    Full text

    thumbnail-image

    Available Versions