Search CORE

2 research outputs found

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Author: Jain Rahul
Kalagarla Krishna C.
Nuzzo Pierluigi
Publication venue
Publication date: 23/09/2020
Field of study

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an

\epsilon

-optimal policy, i.e., with resulting objective value within

\epsilon

of the optimal value and satisfying the constraints within

\epsilon

-tolerance, with probability at least

1-\delta

. The number of episodes needed is shown to be of the order

\tilde{\mathcal{O}}\big(\frac{|S||A|C^{2}H^{2}}{\epsilon^{2}}\log\frac{1}{\delta}\big)

, where

C

is the upper bound on the number of possible successor states for a state-action pair. Therefore, if

C \ll |S|

, the number of episodes needed have a linear dependence on the state and action space sizes

|S|

and

|A|

, respectively, and quadratic dependence on the time horizon

H

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Author: Jain Rahul
Kalagarla Krishna C.
Kartik Dhruva
Nayyar Ashutosh
Nuzzo Pierluigi
Shen Dongming
Publication venue
Publication date: 26/05/2023
Field of study

Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.Comment: arXiv admin note: substantial text overlap with arXiv:2203.0903

arXiv.org e-Print Archive