4,290 research outputs found
Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications
We present a method for designing robust controllers for dynamical systems with linear temporal logic specifications. We abstract the original system by a finite Markov Decision Process (MDP) that has transition probabilities in a specified uncertainty set. A robust control policy for the MDP is generated that maximizes the worst-case probability of satisfying the specification over all transition probabilities in the uncertainty set. To do this, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming allows us to solve for a -suboptimal robust control policy with time complexity times that for the non-robust case. We then implement this control policy on the original dynamical system
Deception in Optimal Control
In this paper, we consider an adversarial scenario where one agent seeks to
achieve an objective and its adversary seeks to learn the agent's intentions
and prevent the agent from achieving its objective. The agent has an incentive
to try to deceive the adversary about its intentions, while at the same time
working to achieve its objective. The primary contribution of this paper is to
introduce a mathematically rigorous framework for the notion of deception
within the context of optimal control. The central notion introduced in the
paper is that of a belief-induced reward: a reward dependent not only on the
agent's state and action, but also adversary's beliefs. Design of an optimal
deceptive strategy then becomes a question of optimal control design on the
product of the agent's state space and the adversary's belief space. The
proposed framework allows for deception to be defined in an arbitrary control
system endowed with a reward function, as well as with additional
specifications limiting the agent's control policy. In addition to defining
deception, we discuss design of optimally deceptive strategies under
uncertainties in agent's knowledge about the adversary's learning process. In
the latter part of the paper, we focus on a setting where the agent's behavior
is governed by a Markov decision process, and show that the design of optimally
deceptive strategies under lack of knowledge about the adversary naturally
reduces to previously discussed problems in control design on partially
observable or uncertain Markov decision processes. Finally, we present two
examples of deceptive strategies: a "cops and robbers" scenario and an example
where an agent may use camouflage while moving. We show that optimally
deceptive strategies in such examples follow the intuitive idea of how to
deceive an adversary in the above settings
From Uncertainty Data to Robust Policies for Temporal Logic Planning
We consider the problem of synthesizing robust disturbance feedback policies
for systems performing complex tasks. We formulate the tasks as linear temporal
logic specifications and encode them into an optimization framework via
mixed-integer constraints. Both the system dynamics and the specifications are
known but affected by uncertainty. The distribution of the uncertainty is
unknown, however realizations can be obtained. We introduce a data-driven
approach where the constraints are fulfilled for a set of realizations and
provide probabilistic generalization guarantees as a function of the number of
considered realizations. We use separate chance constraints for the
satisfaction of the specification and operational constraints. This allows us
to quantify their violation probabilities independently. We compute disturbance
feedback policies as solutions of mixed-integer linear or quadratic
optimization problems. By using feedback we can exploit information of past
realizations and provide feasibility for a wider range of situations compared
to static input sequences. We demonstrate the proposed method on two robust
motion-planning case studies for autonomous driving
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Probabilistic Plan Synthesis for Coupled Multi-Agent Systems
This paper presents a fully automated procedure for controller synthesis for
multi-agent systems under the presence of uncertainties. We model the motion of
each of the agents in the environment as a Markov Decision Process (MDP)
and we assign to each agent one individual high-level formula given in
Probabilistic Computational Tree Logic (PCTL). Each agent may need to
collaborate with other agents in order to achieve a task. The collaboration is
imposed by sharing actions between the agents. We aim to design local control
policies such that each agent satisfies its individual PCTL formula. The
proposed algorithm builds on clustering the agents, MDP products construction
and controller policies design. We show that our approach has better
computational complexity than the centralized case, which traditionally suffers
from very high computational demands.Comment: IFAC WC 2017, Toulouse, Franc
Control of Probabilistic Systems under Dynamic, Partially Known Environments with Temporal Logic Specifications
We consider the synthesis of control policies for probabilistic systems,
modeled by Markov decision processes, operating in partially known environments
with temporal logic specifications. The environment is modeled by a set of
Markov chains. Each Markov chain describes the behavior of the environment in
each mode. The mode of the environment, however, is not known to the system.
Two control objectives are considered: maximizing the expected probability and
maximizing the worst-case probability that the system satisfies a given
specification
- ā¦