Search CORE

1,517 research outputs found

Optimal Control of MDPs with Temporal Logic Constraints

Author: Belta Calin
Cerna Ivana
Svorenova Maria
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we focus on formal synthesis of control policies for finite Markov decision processes with non-negative real-valued costs. We develop an algorithm to automatically generate a policy that guarantees the satisfaction of a correctness specification expressed as a formula of Linear Temporal Logic, while at the same time minimizing the expected average cost between two consecutive satisfactions of a desired property. The existing solutions to this problem are sub-optimal. By leveraging ideas from automata-based model checking and game theory, we provide an optimal solution. We demonstrate the approach on an illustrative example.Comment: Technical report accompanying the CDC 2013 pape

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints

Author: Fu Jie
Topcu Ufuk
Publication venue
Publication date: 01/01/2014
Field of study

We consider synthesis of control policies that maximize the probability of satisfying given temporal logic specifications in unknown, stochastic environments. We model the interaction between the system and its environment as a Markov decision process (MDP) with initially unknown transition probabilities. The solution we develop builds on the so-called model-based probably approximately correct Markov decision process (PAC-MDP) methodology. The algorithm attains an

\varepsilon

-approximately optimal policy with probability

1-\delta

using samples (i.e. observations), time and space that grow polynomially with the size of the MDP, the size of the automaton expressing the temporal logic specification,

\frac{1}{\varepsilon}

\frac{1}{\delta}

and a finite time horizon. In this approach, the system maintains a model of the initially unknown MDP, and constructs a product MDP based on its learned model and the specification automaton that expresses the temporal logic constraints. During execution, the policy is iteratively updated using observation of the transitions taken by the system. The iteration terminates in finitely many steps. With high probability, the resulting policy is such that, for any state, the difference between the probability of satisfying the specification under this policy and the optimal one is within a predefined bound.Comment: 9 pages, 5 figures, Accepted by 2014 Robotics: Science and Systems (RSS

arXiv.org e-Print Archive

CiteSeerX

Toward Specification-Guided Active Mars Exploration for Cooperative Robot Teams

Author: Agha-Mohammadi Ali-Akbar
Ames Aaron D.
Haesaert Sofie
Murray Richard M.
Nilsson Petter
Otsu Kyohei
Thakker Rohan
Vasile Cristian-Ioan
Publication venue: 'Robotics: Science and Systems Foundation'
Publication date: 01/06/2018
Field of study

As a step towards achieving autonomy in space exploration missions, we consider a cooperative robotics system consisting of a copter and a rover. The goal of the copter is to explore an unknown environment so as to maximize knowledge about a science mission expressed in linear temporal logic that is to be executed by the rover. We model environmental uncertainty as a belief space Markov decision process and formulate the problem as a two-step stochastic dynamic program that we solve in a way that leverages the decomposed nature of the overall system. We demonstrate in simulations that the robot team makes intelligent decisions in the face of uncertainty

Caltech Authors

Deception in Optimal Control

Author: Ornik Melkior
Topcu Ufuk
Publication venue
Publication date: 08/05/2018
Field of study

In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings

arXiv.org e-Print Archive

Crossref