113,474 research outputs found
Deception in Optimal Control
In this paper, we consider an adversarial scenario where one agent seeks to
achieve an objective and its adversary seeks to learn the agent's intentions
and prevent the agent from achieving its objective. The agent has an incentive
to try to deceive the adversary about its intentions, while at the same time
working to achieve its objective. The primary contribution of this paper is to
introduce a mathematically rigorous framework for the notion of deception
within the context of optimal control. The central notion introduced in the
paper is that of a belief-induced reward: a reward dependent not only on the
agent's state and action, but also adversary's beliefs. Design of an optimal
deceptive strategy then becomes a question of optimal control design on the
product of the agent's state space and the adversary's belief space. The
proposed framework allows for deception to be defined in an arbitrary control
system endowed with a reward function, as well as with additional
specifications limiting the agent's control policy. In addition to defining
deception, we discuss design of optimally deceptive strategies under
uncertainties in agent's knowledge about the adversary's learning process. In
the latter part of the paper, we focus on a setting where the agent's behavior
is governed by a Markov decision process, and show that the design of optimally
deceptive strategies under lack of knowledge about the adversary naturally
reduces to previously discussed problems in control design on partially
observable or uncertain Markov decision processes. Finally, we present two
examples of deceptive strategies: a "cops and robbers" scenario and an example
where an agent may use camouflage while moving. We show that optimally
deceptive strategies in such examples follow the intuitive idea of how to
deceive an adversary in the above settings
Correct-by-synthesis reinforcement learning with temporal logic constraints
We consider a problem on the synthesis of reactive controllers that optimize
some a priori unknown performance criterion while interacting with an
uncontrolled environment such that the system satisfies a given temporal logic
specification. We decouple the problem into two subproblems. First, we extract
a (maximally) permissive strategy for the system, which encodes multiple
(possibly all) ways in which the system can react to the adversarial
environment and satisfy the specifications. Then, we quantify the a priori
unknown performance criterion as a (still unknown) reward function and compute
an optimal strategy for the system within the operating envelope allowed by the
permissive strategy by using the so-called maximin-Q learning algorithm. We
establish both correctness (with respect to the temporal logic specifications)
and optimality (with respect to the a priori unknown performance criterion) of
this two-step technique for a fragment of temporal logic specifications. For
specifications beyond this fragment, correctness can still be preserved, but
the learned strategy may be sub-optimal. We present an algorithm to the overall
problem, and demonstrate its use and computational requirements on a set of
robot motion planning examples.Comment: 8 pages, 3 figures, 2 tables, submitted to IROS 201
Formal methods for resilient control
Many systems operate in uncertain, possibly adversarial environments, and their successful operation is contingent upon satisfying specific requirements, optimal performance, and ability to recover from unexpected situations. Examples are prevalent in many engineering disciplines such as transportation, robotics, energy, and biological systems. This thesis studies designing correct, resilient, and optimal controllers for discrete-time complex systems from elaborate, possibly vague, specifications.
The first part of the contributions of this thesis is a framework for optimal control of non-deterministic hybrid systems from specifications described by signal temporal logic (STL), which can express a broad spectrum of interesting properties. The method is optimization-based and has several advantages over the existing techniques. When satisfying the specification is impossible, the degree of violation - characterized by STL quantitative semantics - is minimized. The computational limitations are discussed.
The focus of second part is on specific types of systems and specifications for which controllers are synthesized efficiently. A class of monotone systems is introduced for which formal synthesis is scalable and almost complete. It is shown that hybrid macroscopic traffic models fall into this class. Novel techniques in modular verification and synthesis are employed for distributed optimal control, and their usefulness is shown for large-scale traffic management. Apart from monotone systems, a method is introduced for robust constrained control of networked linear systems with communication constraints. Case studies on longitudinal control of vehicular platoons are presented.
The third part is about learning-based control with formal guarantees. Two approaches are studied. First, a formal perspective on adaptive control is provided in which the model is represented by a parametric transition system, and the specification is captured by an automaton. A correct-by-construction framework is developed such that the controller infers the actual parameters and plans accordingly for all possible future transitions and inferences. The second approach is based on hybrid model identification using input-output data. By assuming some limited knowledge of the range of system behaviors, theoretical performance guarantees are provided on implementing the controller designed for the identified model on the original unknown system
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Sciduction: Combining Induction, Deduction, and Structure for Verification and Synthesis
Even with impressive advances in automated formal methods, certain problems
in system verification and synthesis remain challenging. Examples include the
verification of quantitative properties of software involving constraints on
timing and energy consumption, and the automatic synthesis of systems from
specifications. The major challenges include environment modeling,
incompleteness in specifications, and the complexity of underlying decision
problems.
This position paper proposes sciduction, an approach to tackle these
challenges by integrating inductive inference, deductive reasoning, and
structure hypotheses. Deductive reasoning, which leads from general rules or
concepts to conclusions about specific problem instances, includes techniques
such as logical inference and constraint solving. Inductive inference, which
generalizes from specific instances to yield a concept, includes algorithmic
learning from examples. Structure hypotheses are used to define the class of
artifacts, such as invariants or program fragments, generated during
verification or synthesis. Sciduction constrains inductive and deductive
reasoning using structure hypotheses, and actively combines inductive and
deductive reasoning: for instance, deductive techniques generate examples for
learning, and inductive reasoning is used to guide the deductive engines.
We illustrate this approach with three applications: (i) timing analysis of
software; (ii) synthesis of loop-free programs, and (iii) controller synthesis
for hybrid systems. Some future applications are also discussed
- …