This paper describes a novel algorithm called CONMODP
for computing Pareto optimal policies for deterministic
multi-objective sequential decision problems. CON-MODP is
a value iteration based multi-objective dynamic programming
algorithm that only computes stationary policies. We observe that
for guaranteeing convergence to the unique Pareto optimal set of
deterministic stationary policies, the algorithm needs to perform
a policy evaluation step on particular policies that are inconsistent
in a single state that is being expanded. We prove that the
algorithm converges to the Pareto optimal set of value functions
and policies for deterministic infinite horizon discounted multiobjective
Markov decision processes. Experiments show that
CON-MODP is much faster than previous multi-objective value
iteration algorithm
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.