Search CORE

4,290 research outputs found

Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications

Author: Murray Richard M.
Topcu Ufuk
Wolff Eric M.
Publication venue: 'California Institute of Technology Library'
Publication date: 25/09/2011
Field of study

We present a method for designing robust controllers for dynamical systems with linear temporal logic specifications. We abstract the original system by a finite Markov Decision Process (MDP) that has transition probabilities in a specified uncertainty set. A robust control policy for the MDP is generated that maximizes the worst-case probability of satisfying the specification over all transition probabilities in the uncertainty set. To do this, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming allows us to solve for a

\epsilon

-suboptimal robust control policy with time complexity

O(\log 1/\epsilon)

times that for the non-robust case. We then implement this control policy on the original dynamical system

CiteSeerX

Crossref

Caltech Authors

Deception in Optimal Control

Author: Ornik Melkior
Topcu Ufuk
Publication venue
Publication date: 08/05/2018
Field of study

In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings

arXiv.org e-Print Archive

Crossref

From Uncertainty Data to Robust Policies for Temporal Logic Planning

Author: Calafiore G. C.
Ding X. C.
Esfahani P. M.
Fainekos G. E.
Grammatico S.
Jordan M. I.
Karaman S.
Sadigh D.
Wolff E. M.
Wolff E. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

We consider the problem of synthesizing robust disturbance feedback policies for systems performing complex tasks. We formulate the tasks as linear temporal logic specifications and encode them into an optimization framework via mixed-integer constraints. Both the system dynamics and the specifications are known but affected by uncertainty. The distribution of the uncertainty is unknown, however realizations can be obtained. We introduce a data-driven approach where the constraints are fulfilled for a set of realizations and provide probabilistic generalization guarantees as a function of the number of considered realizations. We use separate chance constraints for the satisfaction of the specification and operational constraints. This allows us to quantify their violation probabilities independently. We compute disturbance feedback policies as solutions of mixed-integer linear or quadratic optimization problems. By using feedback we can exploit information of past realizations and provide feasibility for a wider range of situations compared to static input sequences. We demonstrate the proposed method on two robust motion-planning case studies for autonomous driving

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Learning Task Specifications from Demonstrations

Author: Ho Mark K.
Jha Susmit
Seshia Sanjit A.
Tiwari Ashish
Vazquez-Chanlatte Marcell
Publication venue
Publication date: 01/01/2018
Field of study

Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

arXiv.org e-Print Archive

eScholarship - University of California

Probabilistic Plan Synthesis for Coupled Multi-Agent Systems

Author: Dimarogonas Dimos V.
Nikou Alexandros
Tumova Jana
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents a fully automated procedure for controller synthesis for multi-agent systems under the presence of uncertainties. We model the motion of each of the

N

agents in the environment as a Markov Decision Process (MDP) and we assign to each agent one individual high-level formula given in Probabilistic Computational Tree Logic (PCTL). Each agent may need to collaborate with other agents in order to achieve a task. The collaboration is imposed by sharing actions between the agents. We aim to design local control policies such that each agent satisfies its individual PCTL formula. The proposed algorithm builds on clustering the agents, MDP products construction and controller policies design. We show that our approach has better computational complexity than the centralized case, which traditionally suffers from very high computational demands.Comment: IFAC WC 2017, Toulouse, Franc

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Control of Probabilistic Systems under Dynamic, Partially Known Environments with Temporal Logic Specifications

Author: Frazzoli Emilio
Wongpiromsarn Tichakorn
Publication venue
Publication date: 01/01/2012
Field of study

We consider the synthesis of control policies for probabilistic systems, modeled by Markov decision processes, operating in partially known environments with temporal logic specifications. The environment is modeled by a set of Markov chains. Each Markov chain describes the behavior of the environment in each mode. The mode of the environment, however, is not known to the system. Two control objectives are considered: maximizing the expected probability and maximizing the worst-case probability that the system satisfies a given specification

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref