189 research outputs found
Computing the smallest fixed point of order-preserving nonexpansive mappings arising in positive stochastic games and static analysis of programs
The problem of computing the smallest fixed point of an order-preserving map
arises in the study of zero-sum positive stochastic games. It also arises in
static analysis of programs by abstract interpretation. In this context, the
discount rate may be negative. We characterize the minimality of a fixed point
in terms of the nonlinear spectral radius of a certain semidifferential. We
apply this characterization to design a policy iteration algorithm, which
applies to the case of finite state and action spaces. The algorithm returns a
locally minimal fixed point, which turns out to be globally minimal when the
discount rate is nonnegative.Comment: 26 pages, 3 figures. We add new results, improvements and two
examples of positive stochastic games. Note that an initial version of the
paper has appeared in the proceedings of the Eighteenth International
Symposium on Mathematical Theory of Networks and Systems (MTNS2008),
Blacksburg, Virginia, July 200
The Lions-Mercier splitting algorithm and the alternating direction method are instances of the proximal point algorithm
Cover title.Includes bibliographical references.Supported by the Army Research Office. DAAL03-86-K-0171by Johnathan Eckstein
Accelerating Value Iteration with Anchoring
Value Iteration (VI) is foundational to the theory and practice of modern
reinforcement learning, and it is known to converge at a
-rate, where is the discount factor.
Surprisingly, however, the optimal rate for the VI setup was not known, and
finding a general acceleration mechanism has been an open problem. In this
paper, we present the first accelerated VI for both the Bellman consistency and
optimality operators. Our method, called Anc-VI, is based on an
\emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it
reduces the Bellman error faster than standard VI. In particular, Anc-VI
exhibits a -rate for or even ,
while standard VI has rate for , where is
the iteration count. We also provide a complexity lower bound matching the
upper bound up to a constant factor of , thereby establishing optimality of
the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism
provides the same benefit in the approximate VI and Gauss--Seidel VI setups as
well
Fitted Value Function Iteration With Probability One Contractions
This paper studies a value function iteration algorithm that can be applied to almost all stationary dynamic programming problems. Using nonexpansive function approximation and Monte Carlo integration, we develop a randomized fitted Bellman operator and a corresponding algorithm that is globally convergent with probability one. When additional restrictions are imposed, an OP(n-1/2) rate of convergence for Monte Carlo error is obtained.
Tropical polyhedra are equivalent to mean payoff games
We show that several decision problems originating from max-plus or tropical
convexity are equivalent to zero-sum two player game problems. In particular,
we set up an equivalence between the external representation of tropical convex
sets and zero-sum stochastic games, in which tropical polyhedra correspond to
deterministic games with finite action spaces. Then, we show that the winning
initial positions can be determined from the associated tropical polyhedron. We
obtain as a corollary a game theoretical proof of the fact that the tropical
rank of a matrix, defined as the maximal size of a submatrix for which the
optimal assignment problem has a unique solution, coincides with the maximal
number of rows (or columns) of the matrix which are linearly independent in the
tropical sense. Our proofs rely on techniques from non-linear Perron-Frobenius
theory.Comment: 28 pages, 5 figures; v2: updated references, added background
materials and illustrations; v3: minor improvements, references update
Proxomal point algorithm in mathematical programming
Issued as Progress report, and Final report, Project no. G-37-61
The Operator Approach to Entropy Games
Entropy games and matrix multiplication games have been recently introduced by Asarin et al. They model the situation in which one player (Despot) wishes to minimize the growth rate of a matrix product, whereas the other player (Tribune) wishes to maximize it. We develop an operator approach to entropy games. This allows us to show that entropy games can be cast as stochastic mean payoff games in which some action spaces are simplices and payments are given by a relative entropy (Kullback-Leibler divergence). In this way, we show that entropy games with a fixed number of states belonging to Despot can be solved in polynomial time. This approach also allows us to solve these games by a policy iteration algorithm, which we compare with the spectral simplex algorithm developed by Protasov
Convergence Analysis and Improvements for Projection Algorithms and Splitting Methods
Non-smooth convex optimization problems occur in all fields of engineering. A common approach to solving this class of problems is proximal algorithms, or splitting methods. These first-order optimization algorithms are often simple, well suited to solve large-scale problems and have a low computational cost per iteration. Essentially, they encode the solution to an optimization problem as a fixed point of some operator, and iterating this operator eventually results in convergence to an optimal point. However, as for other first order methods, the convergence rate is heavily dependent on the conditioning of the problem. Even though the per-iteration cost is usually low, the number of iterations can become prohibitively large for ill-conditioned problems, especially if a high accuracy solution is sought.In this thesis, a few methods for alleviating this slow convergence are studied, which can be divided into two main approaches. The first are heuristic methods that can be applied to a range of fixed-point algorithms. They are based on understanding typical behavior of these algorithms. While these methods are shown to converge, they come with no guarantees on improved convergence rates.The other approach studies the theoretical rates of a class of projection methods that are used to solve convex feasibility problems. These are problems where the goal is to find a point in the intersection of two, or possibly more, convex sets. A study of how the parameters in the algorithm affect the theoretical convergence rate is presented, as well as how they can be chosen to optimize this rate
- …