Many sequential decision problems involve optimizing one objective function
while imposing constraints on other objectives. Constrained Partially
Observable Markov Decision Processes (C-POMDP) model this case with transition
uncertainty and partial observability. In this work, we first show that
C-POMDPs violate the optimal substructure property over successive decision
steps and thus may exhibit behaviors that are undesirable for some (e.g.,
safety critical) applications. Additionally, online re-planning in C-POMDPs is
often ineffective due to the inconsistency resulting from this violation. To
address these drawbacks, we introduce the Recursively-Constrained POMDP
(RC-POMDP), which imposes additional history-dependent cost constraints on the
C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic
optimal policies and that optimal policies obey Bellman's principle of
optimality. We also present a point-based dynamic programming algorithm for
RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our
algorithm and show that policies for RC-POMDPs produce more desirable behaviors
than policies for C-POMDPs.Comment: Accepted to the Conference on Uncertainty in Artificial Intelligence
(UAI) 202