2 research outputs found
A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints
Constrained Markov Decision Processes (CMDPs) formalize sequential
decision-making problems whose objective is to minimize a cost function while
satisfying constraints on various cost functions. In this paper, we consider
the setting of episodic fixed-horizon CMDPs. We propose an online algorithm
which leverages the linear programming formulation of finite-horizon CMDP for
repeated optimistic planning to provide a probably approximately correct (PAC)
guarantee on the number of episodes needed to ensure an -optimal
policy, i.e., with resulting objective value within of the optimal
value and satisfying the constraints within -tolerance, with
probability at least . The number of episodes needed is shown to be
of the order
,
where is the upper bound on the number of possible successor states for a
state-action pair. Therefore, if , the number of episodes needed
have a linear dependence on the state and action space sizes and ,
respectively, and quadratic dependence on the time horizon
Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes
Autonomous systems often have logical constraints arising, for example, from
safety, operational, or regulatory requirements. Such constraints can be
expressed using temporal logic specifications. The system state is often
partially observable. Moreover, it could encompass a team of multiple agents
with a common objective but disparate information structures and constraints.
In this paper, we first introduce an optimal control theory for partially
observable Markov decision processes (POMDPs) with finite linear temporal logic
constraints. We provide a structured methodology for synthesizing policies that
maximize a cumulative reward while ensuring that the probability of satisfying
a temporal logic constraint is sufficiently high. Our approach comes with
guarantees on approximate reward optimality and constraint satisfaction. We
then build on this approach to design an optimal control framework for
logically constrained multi-agent settings with information asymmetry. We
illustrate the effectiveness of our approach by implementing it on several case
studies.Comment: arXiv admin note: substantial text overlap with arXiv:2203.0903