research

Weighted Markov Decision Processes with perturbation

Abstract

In this paper we consider the weighted reward MDP’s with perturbation. We give the proof of existence of a delta-optimal simple ultimately deterministic policy under the assumption of “scalar value”. We also prove that there exists a delta-i-optimal simple ultimately deterministic policy in the perturbed weighted MDP, for all e E [0, e*) even without the assumption of “scalar value”

    Similar works