134 research outputs found
Sensor Synthesis for POMDPs with Reachability Objectives
Partially observable Markov decision processes (POMDPs) are widely used in
probabilistic planning problems in which an agent interacts with an environment
using noisy and imprecise sensors. We study a setting in which the sensors are
only partially defined and the goal is to synthesize "weakest" additional
sensors, such that in the resulting POMDP, there is a small-memory policy for
the agent that almost-surely (with probability~1) satisfies a reachability
objective. We show that the problem is NP-complete, and present a symbolic
algorithm by encoding the problem into SAT instances. We illustrate trade-offs
between the amount of memory of the policy and the number of additional sensors
on a simple example. We have implemented our approach and consider three
classical POMDP examples from the literature, and show that in all the examples
the number of sensors can be significantly decreased (as compared to the
existing solutions in the literature) without increasing the complexity of the
policies.Comment: arXiv admin note: text overlap with arXiv:1511.0845
Barrier Functions for Multiagent-POMDPs with DTL Specifications
Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfaction of high-level specifications in terms of linear distribution temporal logic (LDTL). To this end, we use sufficient and necessary conditions for the invariance of a given set based on discrete-time barrier functions (DTBFs) and formulate sufficient conditions for finite time DTBF to study finite time convergence to a set. We then show that different LDTL mission/safety specifications can be cast as a set of invariance or finite time reachability problems. We demonstrate that the proposed method for safety-shield synthesis can be implemented online by a sequence of one-step greedy algorithms. We demonstrate the efficacy of the proposed method using experiments involving a team of robots
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes
Autonomous systems are often required to operate in partially observable
environments. They must reliably execute a specified objective even with
incomplete information about the state of the environment. We propose a
methodology to synthesize policies that satisfy a linear temporal logic formula
in a partially observable Markov decision process (POMDP). By formulating a
planning problem, we show how to use point-based value iteration methods to
efficiently approximate the maximum probability of satisfying a desired logical
formula and compute the associated belief state policy. We demonstrate that our
method scales to large POMDP domains and provides strong bounds on the
performance of the resulting policy.Comment: 8 pages, 3 figures, AAAI 202
Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions
A multi-agent partially observable Markov decision process (MPOMDP) is a
modeling paradigm used for high-level planning of heterogeneous autonomous
agents subject to uncertainty and partial observation. Despite their modeling
efficiency, MPOMDPs have not received significant attention in safety-critical
settings. In this paper, we use barrier functions to design policies for
MPOMDPs that ensure safety. Notably, our method does not rely on discretization
of the belief space, or finite memory. To this end, we formulate sufficient and
necessary conditions for the safety of a given set based on discrete-time
barrier functions (DTBFs) and we demonstrate that our formulation also allows
for Boolean compositions of DTBFs for representing more complicated safe sets.
We show that the proposed method can be implemented online by a sequence of
one-step greedy algorithms as a standalone safe controller or as a
safety-filter given a nominal planning policy. We illustrate the efficiency of
the proposed methodology based on DTBFs using a high-fidelity simulation of
heterogeneous robots.Comment: 8 pages and 4 figure
Distributed Synthesis in Continuous Time
We introduce a formalism modelling communication of distributed agents
strictly in continuous-time. Within this framework, we study the problem of
synthesising local strategies for individual agents such that a specified set
of goal states is reached, or reached with at least a given probability. The
flow of time is modelled explicitly based on continuous-time randomness, with
two natural implications: First, the non-determinism stemming from interleaving
disappears. Second, when we restrict to a subclass of non-urgent models, the
quantitative value problem for two players can be solved in EXPTIME. Indeed,
the explicit continuous time enables players to communicate their states by
delaying synchronisation (which is unrestricted for non-urgent models). In
general, the problems are undecidable already for two players in the
quantitative case and three players in the qualitative case. The qualitative
undecidability is shown by a reduction to decentralized POMDPs for which we
provide the strongest (and rather surprising) undecidability result so far
Decision-Making Under Uncertainty: Beyond Probabilities
This position paper reflects on the state-of-the-art in decision-making under
uncertainty. A classical assumption is that probabilities can sufficiently
capture all uncertainty in a system. In this paper, the focus is on the
uncertainty that goes beyond this classical interpretation, particularly by
employing a clear distinction between aleatoric and epistemic uncertainty. The
paper features an overview of Markov decision processes (MDPs) and extensions
to account for partial observability and adversarial behavior. These models
sufficiently capture aleatoric uncertainty but fail to account for epistemic
uncertainty robustly. Consequently, we present a thorough overview of so-called
uncertainty models that exhibit uncertainty in a more robust interpretation. We
show several solution techniques for both discrete and continuous models,
ranging from formal verification, over control-based abstractions, to
reinforcement learning. As an integral part of this paper, we list and discuss
several key challenges that arise when dealing with rich types of uncertainty
in a model-based fashion
Barrier Functions for Multiagent-POMDPs with DTL Specifications
Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfaction of high-level specifications in terms of linear distribution temporal logic (LDTL). To this end, we use sufficient and necessary conditions for the invariance of a given set based on discrete-time barrier functions (DTBFs) and formulate sufficient conditions for finite time DTBF to study finite time convergence to a set. We then show that different LDTL mission/safety specifications can be cast as a set of invariance or finite time reachability problems. We demonstrate that the proposed method for safety-shield synthesis can be implemented online by a sequence of one-step greedy algorithms. We demonstrate the efficacy of the proposed method using experiments involving a team of robots
- …