Search CORE

1,982 research outputs found

An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms

Author: A. Ehrenfeucht and J. Mycielski
Anne Condon
Henrik Björklund and Sergei Vorobyov
Leonid Khachiyan
M. Jurdznski
Nir Piterman
Oliver Friedmann
Oliver Friedmann
Uri Zwick and Mike Paterson
W. Zielonka
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2010
Field of study

This paper presents a new exponential lower bound for the two most popular deterministic variants of the strategy improvement algorithms for solving parity, mean payoff, discounted payoff and simple stochastic games. The first variant improves every node in each step maximizing the current valuation locally, whereas the second variant computes the globally optimal improvement in each step. We outline families of games on which both variants require exponentially many strategy iterations

arXiv.org e-Print Archive

CiteSeerX

Crossref

Multigrid methods for two-player zero-sum stochastic games

Author: Akian
Akian
Altman
Bank
Bardi
Bardi
Barles
Başar
Bellman
Bensoussan
Bensoussan
Berman
Bertsekas
Bokanowski
Bonnans
Brandt
Brandt
Cochet-Terrasson
Crandall
Davis
Denardo
Denardo
Elliott
Falgout
Fearnley
Filar
Fleming
Fleming
Fleming
Friedman
Friedmann
Hoffman
Hoppe
Hoppe
Howard
Kushner
Kushner
Lions
McEneaney
Mense
Munos
Neyman
Notay
Puterman
Puterman
Rockafellar
Ruge
Shapley
Shimkin
Sorin
Świpolhkech
Publication venue: 'Wiley'
Publication date: 22/11/2011
Field of study

We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect information, which combines policy iteration and algebraic multigrid methods. This algorithm can be applied either to a true finite state space zero-sum two player game or to the discretization of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, which allows to improve substantially the computation time for solving some variational inequalities.Comment: 31 page

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Value Iteration for Long-run Average Reward in Markov Decision Processes

Author: A Komuravelli
A McIver
AF Veinott
AK McIver
C Baier
C Courcoubetis
J Filar
K Chatterjee
K Chatterjee
K Chatterjee
K Chatterjee
M Duflot
M Kwiatkowska
M Kwiatkowska
M Kwiatkowska
ML Puterman
O Michael
RA Howard
S Giro
S Haddad
T Brázdil
T Brázdil
T Brázdil
Publication venue
Publication date: 31/08/2017
Field of study

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks

arXiv.org e-Print Archive

Crossref

Using Strategy Improvement to Stay Alive

Author: A. Chakrabarti
A. Ehrenfeucht
A. J. Hoffman
Angelo Montanari
B. V. Cherkassky
B. V. Cherkassky
H. Bj"orklund
J. Chaloupka
J. Chaloupka
J. Cochet-Terrasson
Jakub Chaloupka
L. Brim
L. Brim
L. Doyen
L. Georgiadis
Luboš Brim
Margherita Napoli
Mimmo Parente
P. Bouyer
S. Schewe
U. Zwick
V. A. Gurvich
Y. Lifshits
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2010
Field of study

We design a novel algorithm for solving Mean-Payoff Games (MPGs). Besides solving an MPG in the usual sense, our algorithm computes more information about the game, information that is important with respect to applications. The weights of the edges of an MPG can be thought of as a gained/consumed energy -- depending on the sign. For each vertex, our algorithm computes the minimum amount of initial energy that is sufficient for player Max to ensure that in a play starting from the vertex, the energy level never goes below zero. Our algorithm is not the first algorithm that computes the minimum sufficient initial energies, but according to our experimental study it is the fastest algorithm that computes them. The reason is that it utilizes the strategy improvement technique which is very efficient in practice

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

The level set method for the two-sided eigenproblem

Author: B De Schutter
G Cohen
G Olsder
GL Litvinov
H Bjorklund
J Cochet-Terrasson
J Gunawardena
JJ McDonald
M Akian
M Akian
M Akian
M Akian
M Akian
M Develin
P Binding
P Butkovič
R Nussbaum
RA Cuninghame-Green
RA Cuninghame-Green
RH Möhring
S Gaubert
S Gaubert
S Gaubert
S Gaubert
S Sergeev
S Sergeev
Sergeĭ Sergeev
SM Burns
Stéphane Gaubert
TM Liggett
U Zwick
V Mehrmann
X Allamigeon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We consider the max-plus analogue of the eigenproblem for matrix pencils Ax=lambda Bx. We show that the spectrum of (A,B) (i.e., the set of possible values of lambda), which is a finite union of intervals, can be computed in pseudo-polynomial number of operations, by a (pseudo-polynomial) number of calls to an oracle that computes the value of a mean payoff game. The proof relies on the introduction of a spectral function, which we interpret in terms of the least Chebyshev distance between Ax and lambda Bx. The spectrum is obtained as the zero level set of this function.Comment: 34 pages, 4 figures. Changes with respect to the previous version: we explain relation to mean-payoff games and discrete event systems, and show that the reconstruction of spectrum is pseudopolynomia

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Birmingham Research Portal

INRIA a CCSD electronic archive server

HAL-Polytechnique

Tropical polyhedra are equivalent to mean payoff games

Author: Akian M.
ALEXANDER GUTERMAN
Allamigeon X.
Einsiedler M.
Filar J. A.
Gondran M.
Itenberg I.
Joswig M.
Mallet-Paret J.
MARIANNE AKIAN
STÉPHANE GAUBERT
Vincent J. M.
Zimmermann K.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 09/06/2011
Field of study

We show that several decision problems originating from max-plus or tropical convexity are equivalent to zero-sum two player game problems. In particular, we set up an equivalence between the external representation of tropical convex sets and zero-sum stochastic games, in which tropical polyhedra correspond to deterministic games with finite action spaces. Then, we show that the winning initial positions can be determined from the associated tropical polyhedron. We obtain as a corollary a game theoretical proof of the fact that the tropical rank of a matrix, defined as the maximal size of a submatrix for which the optimal assignment problem has a unique solution, coincides with the maximal number of rows (or columns) of the matrix which are linearly independent in the tropical sense. Our proofs rely on techniques from non-linear Perron-Frobenius theory.Comment: 28 pages, 5 figures; v2: updated references, added background materials and illustrations; v3: minor improvements, references update

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Institute of Mathematics AS CR, v. v. i.

HAL-Polytechnique

Faster Algorithm for Mean-Payoff Games

Author: Chaloupka Jakub
Publication venue: OASIcs - OpenAccess Series in Informatics. Annual Doctoral Workshop on Mathematical and Engineering Methods in Computer Science (MEMICS\u2709)
Publication date: 01/01/2009
Field of study

We study some existing techniques for solving mean-payoff games (MPGs), improve them, and design a randomized algorithm for solving MPGs with currently the best expected complexity

Dagstuhl Research Online Publication Server

Tropical Fourier-Motzkin elimination, with an application to real-time verification

Author: Axel Legay
Briec W.
Butkovič P.
Develin M.
Gaubert S.
Howard R. A.
Ricardo D. Katz
Stéphane Gaubert
Uli Fahrenberg
Xavier Allamigeon
Zimmermann K.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 09/08/2013
Field of study

We introduce a generalization of tropical polyhedra able to express both strict and non-strict inequalities. Such inequalities are handled by means of a semiring of germs (encoding infinitesimal perturbations). We develop a tropical analogue of Fourier-Motzkin elimination from which we derive geometrical properties of these polyhedra. In particular, we show that they coincide with the tropically convex union of (non-necessarily closed) cells that are convex both classically and tropically. We also prove that the redundant inequalities produced when performing successive elimination steps can be dynamically deleted by reduction to mean payoff game problems. As a complement, we provide a coarser (polynomial time) deletion procedure which is enough to arrive at a simply exponential bound for the total execution time. These algorithms are illustrated by an application to real-time systems (reachability analysis of timed automata).Comment: 29 pages, 8 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Optimal market making under partial information and numerical methods for impulse control games with applications

Author: Zabaljauregui Diego
Publication venue
Publication date: 01/01/2019
Field of study

The topics treated in this thesis are inherently two-fold. The first part considers the problem of a market maker who wants to optimally set bid/ask quotes over a finite time horizon, to maximize her expected utility. The intensities of the orders she receives depend not only on the spreads she quotes, but also on unobservable factors modelled by a hidden Markov chain. This stochastic control problem under partial information is solved by means of stochastic filtering, control and piecewise-deterministic Markov processes theory. The value function is characterized as the unique continuous viscosity solution of its dynamic programming equation. Afterwards, the analogous full information problem is solved and results are compared numerically through a concrete example. The optimal full information spreads are shown to be biased when the exact market regime is unknown, as the market maker needs to adjust for additional regime uncertainty in terms of P&L sensitivity and observable order ow volatility. The second part deals with numerically solving nonzero-sum stochastic differential games with impulse controls. These offer a realistic and far-reaching modelling framework for applications within finance, energy markets and other areas, but the diffculty in solving such problems has hindered their proliferation. Semi-analytical approaches make strong assumptions pertaining very particular cases. To the author's best knowledge, there are no numerical methods available in the literature. A policy-iteration-type solver is proposed to solve an underlying system of quasi-variational inequalities, and it is validated numerically with reassuring results. In particular, it is observed that the algorithm does not enjoy global convergence and a heuristic methodology is proposed to construct initial guesses. Eventually, the focus is put on games with a symmetric structure and a substantially improved version of the former algorithm is put forward. A rigorous convergence analysis is undertaken with natural assumptions on the players strategies, which admit graph-theoretic interpretations in the context of weakly chained diagonally dominant matrices. A provably convergent single-player impulse control solver, often outperforming classical policy iteration, is also provided. The main algorithm is used to compute with high precision equilibrium payoffs and Nash equilibria of otherwise too challenging problems, and even some for which results go beyond the scope of all the currently available theory

arXiv.org e-Print Archive

LSE Theses Online