13,937 research outputs found
Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes
Autonomous systems often have logical constraints arising, for example, from
safety, operational, or regulatory requirements. Such constraints can be
expressed using temporal logic specifications. The system state is often
partially observable. Moreover, it could encompass a team of multiple agents
with a common objective but disparate information structures and constraints.
In this paper, we first introduce an optimal control theory for partially
observable Markov decision processes (POMDPs) with finite linear temporal logic
constraints. We provide a structured methodology for synthesizing policies that
maximize a cumulative reward while ensuring that the probability of satisfying
a temporal logic constraint is sufficiently high. Our approach comes with
guarantees on approximate reward optimality and constraint satisfaction. We
then build on this approach to design an optimal control framework for
logically constrained multi-agent settings with information asymmetry. We
illustrate the effectiveness of our approach by implementing it on several case
studies.Comment: arXiv admin note: substantial text overlap with arXiv:2203.0903
Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints
We consider synthesis of control policies that maximize the probability of
satisfying given temporal logic specifications in unknown, stochastic
environments. We model the interaction between the system and its environment
as a Markov decision process (MDP) with initially unknown transition
probabilities. The solution we develop builds on the so-called model-based
probably approximately correct Markov decision process (PAC-MDP) methodology.
The algorithm attains an -approximately optimal policy with
probability using samples (i.e. observations), time and space that
grow polynomially with the size of the MDP, the size of the automaton
expressing the temporal logic specification, ,
and a finite time horizon. In this approach, the system
maintains a model of the initially unknown MDP, and constructs a product MDP
based on its learned model and the specification automaton that expresses the
temporal logic constraints. During execution, the policy is iteratively updated
using observation of the transitions taken by the system. The iteration
terminates in finitely many steps. With high probability, the resulting policy
is such that, for any state, the difference between the probability of
satisfying the specification under this policy and the optimal one is within a
predefined bound.Comment: 9 pages, 5 figures, Accepted by 2014 Robotics: Science and Systems
(RSS
MDP Optimal Control under Temporal Logic Constraints
In this paper, we develop a method to automatically generate a control policy
for a dynamical system modeled as a Markov Decision Process (MDP). The control
specification is given as a Linear Temporal Logic (LTL) formula over a set of
propositions defined on the states of the MDP. We synthesize a control policy
such that the MDP satisfies the given specification almost surely, if such a
policy exists. In addition, we designate an "optimizing proposition" to be
repeatedly satisfied, and we formulate a novel optimization criterion in terms
of minimizing the expected cost in between satisfactions of this proposition.
We propose a sufficient condition for a policy to be optimal, and develop a
dynamic programming algorithm that synthesizes a policy that is optimal under
some conditions, and sub-optimal otherwise. This problem is motivated by
robotic applications requiring persistent tasks, such as environmental
monitoring or data gathering, to be performed.Comment: Technical report accompanying the CDC2011 submissio
- …