Search CORE

4,581 research outputs found

Stochastic Shortest Path with Energy Constraints in POMDPs

Author: Brázdil Tomáš
Chatterjee Krishnendu
Chmelík Martin
Gupta Anchit
Novotný Petr
Publication venue
Publication date: 01/01/2016
Field of study

We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

Deception in Optimal Control

Author: Ornik Melkior
Topcu Ufuk
Publication venue
Publication date: 08/05/2018
Field of study

In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings

arXiv.org e-Print Archive

Crossref

Verification and Control of Partially Observable Probabilistic Real-Time Systems

Author: B Finkbeiner
C Baier
C Baier
F Cassez
G Behrmann
G Norman
G Shani
M Kang
M Kwiatkowska
M Kwiatkowska
O Madani
P Bouyer
P Bouyer
P Černý
R Alur
S Giro
TA Henzinger
W Lovejoy
Publication venue
Publication date: 01/01/2015
Field of study

We propose automated techniques for the verification and control of probabilistic real-time systems that are only partially observable. To formally model such systems, we define an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give a probabilistic temporal logic that can express a range of quantitative properties of these models, relating to the probability of an event's occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or to synthesise a controller for the model which makes it true. Our approach is based on an integer discretisation of the model's dense-time behaviour and a grid-based abstraction of the uncountable belief space induced by partial observability. The latter is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies, from the domains of computer security and task scheduling

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Birmingham Research Portal

Enlighten

Coverage and Field Estimation on Bounded Domains by Diffusive Swarms

Author: Adams Chase
Berman Spring
Elamvazhuthi Karthik
Publication venue
Publication date: 01/10/2016
Field of study

In this paper, we consider stochastic coverage of bounded domains by a diffusing swarm of robots that take local measurements of an underlying scalar field. We introduce three control methodologies with diffusion, advection, and reaction as independent control inputs. We analyze the diffusion-based control strategy using standard operator semigroup-theoretic arguments. We show that the diffusion coefficient can be chosen to be dependent only on the robots' local measurements to ensure that the swarm density converges to a function proportional to the scalar field. The boundedness of the domain precludes the need to impose assumptions on decaying properties of the scalar field at infinity. Moreover, exponential convergence of the swarm density to the equilibrium follows from properties of the spectrum of the semigroup generator. In addition, we use the proposed coverage method to construct a time-inhomogenous diffusion process and apply the observability of the heat equation to reconstruct the scalar field over the entire domain from observations of the robots' random motion over a small subset of the domain. We verify our results through simulations of the coverage scenario on a 2D domain and the field estimation scenario on a 1D domain.Comment: To appear in the proceedings of the 55th IEEE Conference on Decision and Control (CDC 2016

arXiv.org e-Print Archive

Crossref

Verification and control of partially observable probabilistic systems

Author: Norman Gethin
Parker David
Zou Xueyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We present automated techniques for the verification and control of partially observable, probabilistic systems for both discrete and dense models of time. For the discrete-time case, we formally model these systems using partially observable Markov decision processes; for dense time, we propose an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give probabilistic temporal logics that can express a range of quantitative properties of these models, relating to the probability of an event’s occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or synthesise a controller for the model which makes it true. Our approach is based on a grid-based abstraction of the uncountable belief space induced by partial observability and, for dense-time models, an integer discretisation of real-time behaviour. The former is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies from the domains of task and network scheduling, computer security and planning

Springer - Publisher Connector

University of Birmingham Research Portal

Enlighten