4,581 research outputs found
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Deception in Optimal Control
In this paper, we consider an adversarial scenario where one agent seeks to
achieve an objective and its adversary seeks to learn the agent's intentions
and prevent the agent from achieving its objective. The agent has an incentive
to try to deceive the adversary about its intentions, while at the same time
working to achieve its objective. The primary contribution of this paper is to
introduce a mathematically rigorous framework for the notion of deception
within the context of optimal control. The central notion introduced in the
paper is that of a belief-induced reward: a reward dependent not only on the
agent's state and action, but also adversary's beliefs. Design of an optimal
deceptive strategy then becomes a question of optimal control design on the
product of the agent's state space and the adversary's belief space. The
proposed framework allows for deception to be defined in an arbitrary control
system endowed with a reward function, as well as with additional
specifications limiting the agent's control policy. In addition to defining
deception, we discuss design of optimally deceptive strategies under
uncertainties in agent's knowledge about the adversary's learning process. In
the latter part of the paper, we focus on a setting where the agent's behavior
is governed by a Markov decision process, and show that the design of optimally
deceptive strategies under lack of knowledge about the adversary naturally
reduces to previously discussed problems in control design on partially
observable or uncertain Markov decision processes. Finally, we present two
examples of deceptive strategies: a "cops and robbers" scenario and an example
where an agent may use camouflage while moving. We show that optimally
deceptive strategies in such examples follow the intuitive idea of how to
deceive an adversary in the above settings
Verification and Control of Partially Observable Probabilistic Real-Time Systems
We propose automated techniques for the verification and control of
probabilistic real-time systems that are only partially observable. To formally
model such systems, we define an extension of probabilistic timed automata in
which local states are partially visible to an observer or controller. We give
a probabilistic temporal logic that can express a range of quantitative
properties of these models, relating to the probability of an event's
occurrence or the expected value of a reward measure. We then propose
techniques to either verify that such a property holds or to synthesise a
controller for the model which makes it true. Our approach is based on an
integer discretisation of the model's dense-time behaviour and a grid-based
abstraction of the uncountable belief space induced by partial observability.
The latter is necessarily approximate since the underlying problem is
undecidable, however we show how both lower and upper bounds on numerical
results can be generated. We illustrate the effectiveness of the approach by
implementing it in the PRISM model checker and applying it to several case
studies, from the domains of computer security and task scheduling
Coverage and Field Estimation on Bounded Domains by Diffusive Swarms
In this paper, we consider stochastic coverage of bounded domains by a
diffusing swarm of robots that take local measurements of an underlying scalar
field. We introduce three control methodologies with diffusion, advection, and
reaction as independent control inputs. We analyze the diffusion-based control
strategy using standard operator semigroup-theoretic arguments. We show that
the diffusion coefficient can be chosen to be dependent only on the robots'
local measurements to ensure that the swarm density converges to a function
proportional to the scalar field. The boundedness of the domain precludes the
need to impose assumptions on decaying properties of the scalar field at
infinity. Moreover, exponential convergence of the swarm density to the
equilibrium follows from properties of the spectrum of the semigroup generator.
In addition, we use the proposed coverage method to construct a
time-inhomogenous diffusion process and apply the observability of the heat
equation to reconstruct the scalar field over the entire domain from
observations of the robots' random motion over a small subset of the domain. We
verify our results through simulations of the coverage scenario on a 2D domain
and the field estimation scenario on a 1D domain.Comment: To appear in the proceedings of the 55th IEEE Conference on Decision
and Control (CDC 2016
Verification and control of partially observable probabilistic systems
We present automated techniques for the verification and control of partially observable, probabilistic systems for both discrete and dense models of time. For the discrete-time case, we formally model these systems using partially observable Markov decision processes; for dense time, we propose an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give probabilistic temporal logics that can express a range of quantitative properties of these models, relating to the probability of an event’s occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or synthesise a controller for the model which makes it true. Our approach is based on a grid-based abstraction of the uncountable belief space induced by partial observability and, for dense-time models, an integer discretisation of real-time behaviour. The former is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies from the domains of task and network scheduling, computer security and planning
- …