Recursively-Constrained Partially Observable Markov Decision Processes

Becker, Tyler; Feather, Martin S.; Ho, Qi Heng; Kraske, Benjamin; Lahijanian, Morteza; Laouar, Zakariya; Rossi, Federico; Sunberg, Zachary N.

text

Recursively-Constrained Partially Observable Markov Decision Processes

Authors: Tyler Becker
Martin S. Feather
Qi Heng Ho
Benjamin Kraske
Morteza Lahijanian
Zakariya Laouar
Federico Rossi
Zachary N. Sunberg
Publication date: 4 June 2024
Publisher

Abstract

Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.Comment: Accepted to the Conference on Uncertainty in Artificial Intelligence (UAI) 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.09688

Last time updated on 26/12/2024