15,541 research outputs found
Expectations or Guarantees? I Want It All! A crossroad between games and MDPs
When reasoning about the strategic capabilities of an agent, it is important
to consider the nature of its adversaries. In the particular context of
controller synthesis for quantitative specifications, the usual problem is to
devise a strategy for a reactive system which yields some desired performance,
taking into account the possible impact of the environment of the system. There
are at least two ways to look at this environment. In the classical analysis of
two-player quantitative games, the environment is purely antagonistic and the
problem is to provide strict performance guarantees. In Markov decision
processes, the environment is seen as purely stochastic: the aim is then to
optimize the expected payoff, with no guarantee on individual outcomes.
In this expository work, we report on recent results introducing the beyond
worst-case synthesis problem, which is to construct strategies that guarantee
some quantitative requirement in the worst-case while providing an higher
expected value against a particular stochastic model of the environment given
as input. This problem is relevant to produce system controllers that provide
nice expected performance in the everyday situation while ensuring a strict
(but relaxed) performance threshold even in the event of very bad (while
unlikely) circumstances. It has been studied for both the mean-payoff and the
shortest path quantitative measures.Comment: In Proceedings SR 2014, arXiv:1404.041
Multi-Objective Model Checking of Markov Decision Processes
We study and provide efficient algorithms for multi-objective model checking
problems for Markov Decision Processes (MDPs). Given an MDP, M, and given
multiple linear-time (\omega -regular or LTL) properties \varphi\_i, and
probabilities r\_i \epsilon [0,1], i=1,...,k, we ask whether there exists a
strategy \sigma for the controller such that, for all i, the probability that a
trajectory of M controlled by \sigma satisfies \varphi\_i is at least r\_i. We
provide an algorithm that decides whether there exists such a strategy and if
so produces it, and which runs in time polynomial in the size of the MDP. Such
a strategy may require the use of both randomization and memory. We also
consider more general multi-objective \omega -regular queries, which we
motivate with an application to assume-guarantee compositional reasoning for
probabilistic systems.
Note that there can be trade-offs between different properties: satisfying
property \varphi\_1 with high probability may necessitate satisfying \varphi\_2
with low probability. Viewing this as a multi-objective optimization problem,
we want information about the "trade-off curve" or Pareto curve for maximizing
the probabilities of different properties. We show that one can compute an
approximate Pareto curve with respect to a set of \omega -regular properties in
time polynomial in the size of the MDP.
Our quantitative upper bounds use LP methods. We also study qualitative
multi-objective model checking problems, and we show that these can be analysed
by purely graph-theoretic methods, even though the strategies may still require
both randomization and memory.Comment: 21 pages, 2 figure
Multiple-Environment Markov Decision Processes
We introduce Multi-Environment Markov Decision Processes (MEMDPs) which are
MDPs with a set of probabilistic transition functions. The goal in a MEMDP is
to synthesize a single controller with guaranteed performances against all
environments even though the environment is unknown a priori. While MEMDPs can
be seen as a special class of partially observable MDPs, we show that several
verification problems that are undecidable for partially observable MDPs, are
decidable for MEMDPs and sometimes have even efficient solutions
Feature Markov Decision Processes
General purpose intelligent learning agents cycle through (complex,non-MDP)
sequences of observations, actions, and rewards. On the other hand,
reinforcement learning is well-developed for small finite state Markov Decision
Processes (MDPs). So far it is an art performed by human designers to extract
the right state representation out of the bare observations, i.e. to reduce the
agent setup to the MDP framework. Before we can think of mechanizing this
search for suitable MDPs, we need a formal objective criterion. The main
contribution of this article is to develop such a criterion. I also integrate
the various parts into one learning algorithm. Extensions to more realistic
dynamic Bayesian networks are developed in a companion article.Comment: 7 page
Equilibria, Fixed Points, and Complexity Classes
Many models from a variety of areas involve the computation of an equilibrium
or fixed point of some kind. Examples include Nash equilibria in games; market
equilibria; computing optimal strategies and the values of competitive games
(stochastic and other games); stable configurations of neural networks;
analysing basic stochastic models for evolution like branching processes and
for language like stochastic context-free grammars; and models that incorporate
the basic primitives of probability and recursion like recursive Markov chains.
It is not known whether these problems can be solved in polynomial time. There
are certain common computational principles underlying different types of
equilibria, which are captured by the complexity classes PLS, PPAD, and FIXP.
Representative complete problems for these classes are respectively, pure Nash
equilibria in games where they are guaranteed to exist, (mixed) Nash equilibria
in 2-player normal form games, and (mixed) Nash equilibria in normal form games
with 3 (or more) players. This paper reviews the underlying computational
principles and the corresponding classes
- …