In this paper we consider the weighted reward MDP’s
with perturbation. We give the proof of existence of a
delta-optimal simple ultimately deterministic policy under
the assumption of “scalar value”. We also prove
that there exists a delta-i-optimal simple ultimately deterministic
policy in the perturbed weighted MDP, for
all e E [0, e*) even without the assumption of “scalar
value”