1 research outputs found
Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs
Policy Iteration (PI) is a widely used family of algorithms to compute
optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on
the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in
which every state-action pair has a unique next state. Our results include a
non-trivial upper bound that applies to the entire family of PI algorithms;
another to all "max-gain" switching variants; and affirmation that a conjecture
regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on
certain graph-theoretic results, which may be of independent interest.Comment: Added new bounds for two state MDP