189 research outputs found

    Computing the smallest fixed point of order-preserving nonexpansive mappings arising in positive stochastic games and static analysis of programs

    Full text link
    The problem of computing the smallest fixed point of an order-preserving map arises in the study of zero-sum positive stochastic games. It also arises in static analysis of programs by abstract interpretation. In this context, the discount rate may be negative. We characterize the minimality of a fixed point in terms of the nonlinear spectral radius of a certain semidifferential. We apply this characterization to design a policy iteration algorithm, which applies to the case of finite state and action spaces. The algorithm returns a locally minimal fixed point, which turns out to be globally minimal when the discount rate is nonnegative.Comment: 26 pages, 3 figures. We add new results, improvements and two examples of positive stochastic games. Note that an initial version of the paper has appeared in the proceedings of the Eighteenth International Symposium on Mathematical Theory of Networks and Systems (MTNS2008), Blacksburg, Virginia, July 200

    The Lions-Mercier splitting algorithm and the alternating direction method are instances of the proximal point algorithm

    Get PDF
    Cover title.Includes bibliographical references.Supported by the Army Research Office. DAAL03-86-K-0171by Johnathan Eckstein

    Accelerating Value Iteration with Anchoring

    Full text link
    Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a O(γk)\mathcal{O}(\gamma^k)-rate, where γ\gamma is the discount factor. Surprisingly, however, the optimal rate for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an \emph{anchoring} mechanism (distinct from Nesterov's acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a O(1/k)\mathcal{O}(1/k)-rate for γ1\gamma\approx 1 or even γ=1\gamma=1, while standard VI has rate O(1)\mathcal{O}(1) for γ11/k\gamma\ge 1-1/k, where kk is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of 44, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss--Seidel VI setups as well

    Fitted Value Function Iteration With Probability One Contractions

    Get PDF
    This paper studies a value function iteration algorithm that can be applied to almost all stationary dynamic programming problems. Using nonexpansive function approximation and Monte Carlo integration, we develop a randomized fitted Bellman operator and a corresponding algorithm that is globally convergent with probability one. When additional restrictions are imposed, an OP(n-1/2) rate of convergence for Monte Carlo error is obtained.

    Tropical polyhedra are equivalent to mean payoff games

    Full text link
    We show that several decision problems originating from max-plus or tropical convexity are equivalent to zero-sum two player game problems. In particular, we set up an equivalence between the external representation of tropical convex sets and zero-sum stochastic games, in which tropical polyhedra correspond to deterministic games with finite action spaces. Then, we show that the winning initial positions can be determined from the associated tropical polyhedron. We obtain as a corollary a game theoretical proof of the fact that the tropical rank of a matrix, defined as the maximal size of a submatrix for which the optimal assignment problem has a unique solution, coincides with the maximal number of rows (or columns) of the matrix which are linearly independent in the tropical sense. Our proofs rely on techniques from non-linear Perron-Frobenius theory.Comment: 28 pages, 5 figures; v2: updated references, added background materials and illustrations; v3: minor improvements, references update

    Proxomal point algorithm in mathematical programming

    Get PDF
    Issued as Progress report, and Final report, Project no. G-37-61

    The Operator Approach to Entropy Games

    Get PDF
    Entropy games and matrix multiplication games have been recently introduced by Asarin et al. They model the situation in which one player (Despot) wishes to minimize the growth rate of a matrix product, whereas the other player (Tribune) wishes to maximize it. We develop an operator approach to entropy games. This allows us to show that entropy games can be cast as stochastic mean payoff games in which some action spaces are simplices and payments are given by a relative entropy (Kullback-Leibler divergence). In this way, we show that entropy games with a fixed number of states belonging to Despot can be solved in polynomial time. This approach also allows us to solve these games by a policy iteration algorithm, which we compare with the spectral simplex algorithm developed by Protasov

    Convergence Analysis and Improvements for Projection Algorithms and Splitting Methods

    Get PDF
    Non-smooth convex optimization problems occur in all fields of engineering. A common approach to solving this class of problems is proximal algorithms, or splitting methods. These first-order optimization algorithms are often simple, well suited to solve large-scale problems and have a low computational cost per iteration. Essentially, they encode the solution to an optimization problem as a fixed point of some operator, and iterating this operator eventually results in convergence to an optimal point. However, as for other first order methods, the convergence rate is heavily dependent on the conditioning of the problem. Even though the per-iteration cost is usually low, the number of iterations can become prohibitively large for ill-conditioned problems, especially if a high accuracy solution is sought.In this thesis, a few methods for alleviating this slow convergence are studied, which can be divided into two main approaches. The first are heuristic methods that can be applied to a range of fixed-point algorithms. They are based on understanding typical behavior of these algorithms. While these methods are shown to converge, they come with no guarantees on improved convergence rates.The other approach studies the theoretical rates of a class of projection methods that are used to solve convex feasibility problems. These are problems where the goal is to find a point in the intersection of two, or possibly more, convex sets. A study of how the parameters in the algorithm affect the theoretical convergence rate is presented, as well as how they can be chosen to optimize this rate
    corecore