We consider the problem of finding an optimal feedback controller for a network of interconnected subsystems, each of which is a Markov decision process. Each subsystem is coupled to its neighbors via communication links by which signals are delayed but are otherwise transmitted noise-free. One of the subsystems receives input from a controller, and the controller receives delayed statemeasurements from all of the subsystems. We show that an optimal controller requires only a finite amount of memory which does not grow with time, and obtain a bound on the amount of memory that a controller needs to have for each subsystem. This makes the computation of an optimal controller through dynamic programming tractable. We illustrate our result by a numerical example, and show that it generalizes previous results on Markov decision processes with delayed state measurements.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.