38,767 research outputs found
Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control
We consider the problem of finding a control policy for a Markov Decision
Process (MDP) to maximize the probability of reaching some states while
avoiding some other states. This problem is motivated by applications in
robotics, where such problems naturally arise when probabilistic models of
robot motion are required to satisfy temporal logic task specifications. We
transform this problem into a Stochastic Shortest Path (SSP) problem and
develop a new approximate dynamic programming algorithm to solve it. This
algorithm is of the actor-critic type and uses a least-square temporal
difference learning method. It operates on sample paths of the system and
optimizes the policy within a pre-specified class parameterized by a
parsimonious set of parameters. We show its convergence to a policy
corresponding to a stationary point in the parameters' space. Simulation
results confirm the effectiveness of the proposed solution.Comment: Technical report accompanying an accepted paper to CDC 201
A tutorial on recursive models for analyzing and predicting path choice behavior
The problem at the heart of this tutorial consists in modeling the path
choice behavior of network users. This problem has been extensively studied in
transportation science, where it is known as the route choice problem. In this
literature, individuals' choice of paths are typically predicted using discrete
choice models. This article is a tutorial on a specific category of discrete
choice models called recursive, and it makes three main contributions: First,
for the purpose of assisting future research on route choice, we provide a
comprehensive background on the problem, linking it to different fields
including inverse optimization and inverse reinforcement learning. Second, we
formally introduce the problem and the recursive modeling idea along with an
overview of existing models, their properties and applications. Third, we
extensively analyze illustrative examples from different angles so that a
novice reader can gain intuition on the problem and the advantages provided by
recursive models in comparison to path-based ones
Symblicit algorithms for optimal strategy synthesis in monotonic Markov decision processes
When treating Markov decision processes (MDPs) with large state spaces, using
explicit representations quickly becomes unfeasible. Lately, Wimmer et al. have
proposed a so-called symblicit algorithm for the synthesis of optimal
strategies in MDPs, in the quantitative setting of expected mean-payoff. This
algorithm, based on the strategy iteration algorithm of Howard and Veinott,
efficiently combines symbolic and explicit data structures, and uses binary
decision diagrams as symbolic representation. The aim of this paper is to show
that the new data structure of pseudo-antichains (an extension of antichains)
provides another interesting alternative, especially for the class of monotonic
MDPs. We design efficient pseudo-antichain based symblicit algorithms (with
open source implementations) for two quantitative settings: the expected
mean-payoff and the stochastic shortest path. For two practical applications
coming from automated planning and LTL synthesis, we report promising
experimental results w.r.t. both the run time and the memory consumption.Comment: In Proceedings SYNT 2014, arXiv:1407.493
Expectations or Guarantees? I Want It All! A crossroad between games and MDPs
When reasoning about the strategic capabilities of an agent, it is important
to consider the nature of its adversaries. In the particular context of
controller synthesis for quantitative specifications, the usual problem is to
devise a strategy for a reactive system which yields some desired performance,
taking into account the possible impact of the environment of the system. There
are at least two ways to look at this environment. In the classical analysis of
two-player quantitative games, the environment is purely antagonistic and the
problem is to provide strict performance guarantees. In Markov decision
processes, the environment is seen as purely stochastic: the aim is then to
optimize the expected payoff, with no guarantee on individual outcomes.
In this expository work, we report on recent results introducing the beyond
worst-case synthesis problem, which is to construct strategies that guarantee
some quantitative requirement in the worst-case while providing an higher
expected value against a particular stochastic model of the environment given
as input. This problem is relevant to produce system controllers that provide
nice expected performance in the everyday situation while ensuring a strict
(but relaxed) performance threshold even in the event of very bad (while
unlikely) circumstances. It has been studied for both the mean-payoff and the
shortest path quantitative measures.Comment: In Proceedings SR 2014, arXiv:1404.041
- …