Skip to main content
Article thumbnail
Location of Repository

Computing Optimal Stationary Policies for Multi-objective Markov Decision Processes

By M.A. Wiering and E.D. de Jong


This paper describes a novel algorithm called CONMODP\ud for computing Pareto optimal policies for deterministic\ud multi-objective sequential decision problems. CON-MODP is\ud a value iteration based multi-objective dynamic programming\ud algorithm that only computes stationary policies. We observe that\ud for guaranteeing convergence to the unique Pareto optimal set of\ud deterministic stationary policies, the algorithm needs to perform\ud a policy evaluation step on particular policies that are inconsistent\ud in a single state that is being expanded. We prove that the\ud algorithm converges to the Pareto optimal set of value functions\ud and policies for deterministic infinite horizon discounted multiobjective\ud Markov decision processes. Experiments show that\ud CON-MODP is much faster than previous multi-objective value\ud iteration algorithm

Topics: Wiskunde en Informatica
Year: 2007
OAI identifier:
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.