113,474 research outputs found

    Deception in Optimal Control

    Full text link
    In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings

    Correct-by-synthesis reinforcement learning with temporal logic constraints

    Full text link
    We consider a problem on the synthesis of reactive controllers that optimize some a priori unknown performance criterion while interacting with an uncontrolled environment such that the system satisfies a given temporal logic specification. We decouple the problem into two subproblems. First, we extract a (maximally) permissive strategy for the system, which encodes multiple (possibly all) ways in which the system can react to the adversarial environment and satisfy the specifications. Then, we quantify the a priori unknown performance criterion as a (still unknown) reward function and compute an optimal strategy for the system within the operating envelope allowed by the permissive strategy by using the so-called maximin-Q learning algorithm. We establish both correctness (with respect to the temporal logic specifications) and optimality (with respect to the a priori unknown performance criterion) of this two-step technique for a fragment of temporal logic specifications. For specifications beyond this fragment, correctness can still be preserved, but the learned strategy may be sub-optimal. We present an algorithm to the overall problem, and demonstrate its use and computational requirements on a set of robot motion planning examples.Comment: 8 pages, 3 figures, 2 tables, submitted to IROS 201

    Formal methods for resilient control

    Get PDF
    Many systems operate in uncertain, possibly adversarial environments, and their successful operation is contingent upon satisfying specific requirements, optimal performance, and ability to recover from unexpected situations. Examples are prevalent in many engineering disciplines such as transportation, robotics, energy, and biological systems. This thesis studies designing correct, resilient, and optimal controllers for discrete-time complex systems from elaborate, possibly vague, specifications. The first part of the contributions of this thesis is a framework for optimal control of non-deterministic hybrid systems from specifications described by signal temporal logic (STL), which can express a broad spectrum of interesting properties. The method is optimization-based and has several advantages over the existing techniques. When satisfying the specification is impossible, the degree of violation - characterized by STL quantitative semantics - is minimized. The computational limitations are discussed. The focus of second part is on specific types of systems and specifications for which controllers are synthesized efficiently. A class of monotone systems is introduced for which formal synthesis is scalable and almost complete. It is shown that hybrid macroscopic traffic models fall into this class. Novel techniques in modular verification and synthesis are employed for distributed optimal control, and their usefulness is shown for large-scale traffic management. Apart from monotone systems, a method is introduced for robust constrained control of networked linear systems with communication constraints. Case studies on longitudinal control of vehicular platoons are presented. The third part is about learning-based control with formal guarantees. Two approaches are studied. First, a formal perspective on adaptive control is provided in which the model is represented by a parametric transition system, and the specification is captured by an automaton. A correct-by-construction framework is developed such that the controller infers the actual parameters and plans accordingly for all possible future transitions and inferences. The second approach is based on hybrid model identification using input-output data. By assuming some limited knowledge of the range of system behaviors, theoretical performance guarantees are provided on implementing the controller designed for the identified model on the original unknown system

    Learning Task Specifications from Demonstrations

    Full text link
    Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

    Sciduction: Combining Induction, Deduction, and Structure for Verification and Synthesis

    Full text link
    Even with impressive advances in automated formal methods, certain problems in system verification and synthesis remain challenging. Examples include the verification of quantitative properties of software involving constraints on timing and energy consumption, and the automatic synthesis of systems from specifications. The major challenges include environment modeling, incompleteness in specifications, and the complexity of underlying decision problems. This position paper proposes sciduction, an approach to tackle these challenges by integrating inductive inference, deductive reasoning, and structure hypotheses. Deductive reasoning, which leads from general rules or concepts to conclusions about specific problem instances, includes techniques such as logical inference and constraint solving. Inductive inference, which generalizes from specific instances to yield a concept, includes algorithmic learning from examples. Structure hypotheses are used to define the class of artifacts, such as invariants or program fragments, generated during verification or synthesis. Sciduction constrains inductive and deductive reasoning using structure hypotheses, and actively combines inductive and deductive reasoning: for instance, deductive techniques generate examples for learning, and inductive reasoning is used to guide the deductive engines. We illustrate this approach with three applications: (i) timing analysis of software; (ii) synthesis of loop-free programs, and (iii) controller synthesis for hybrid systems. Some future applications are also discussed
    • …
    corecore