Search CORE

77 research outputs found

Improved and generalized upper bounds on the complexity of policy iteration

Author: Scherrer Bruno
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 09/02/2016
Field of study

Markov decision processes ; Dynamic Programming ; Analysis of AlgorithmsInternational audienceGiven a Markov Decision Process (MDP) with

n

states and a totalnumber

m

of actions, we study the number of iterations needed byPolicy Iteration (PI) algorithms to converge to the optimal

\gamma

-discounted policy. We consider two variations of PI: Howard'sPI that changes the actions in all states with a positive advantage,and Simplex-PI that only changes the action in the state with maximaladvantage. We show that Howard's PI terminates after at most

O\left(\frac{m}{1-\gamma}\log\left(\frac{1}{1-\gamma}\right)\right)

iterations, improving by a factor

O(\log n)

a result by Hansen etal., while Simplex-PI terminates after at most

O\left(\frac{nm}{1-\gamma}\log\left(\frac{1}{1-\gamma}\right)\right)

iterations, improving by a factor

O(\log n)

a result by Ye. Undersome structural properties of the MDP, we then consider bounds thatare independent of the discount factor~

\gamma

: quantities ofinterest are bounds

\tau_t

and

\tau_r

---uniform on all states andpolicies---respectively on the \emph{expected time spent in transientstates} and \emph{the inverse of the frequency of visits in recurrentstates} given that the process starts from the uniform distribution.Indeed, we show that Simplex-PI terminates after at most

\tilde O\left( n^3 m^2 \tau_t \tau_r \right)

iterations. This extends arecent result for deterministic MDPs by Post \& Ye, in which

\tau_t\le 1

and

\tau_r \le n

; in particular it shows that Simplex-PI isstrongly polynomial for a much larger class of MDPs. We explain whysimilar results seem hard to derive for Howard's PI. Finally, underthe additional (restrictive) assumption that the state space ispartitioned in two sets, respectively states that are transient andrecurrent for all policies, we show that both Howard's PI andSimplex-PI terminate after at most

\tilde O(m(n^2\tau_t+n\tau_r))

iterations

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Toward the Rectilinear Crossing Number of $K_n$ : New Drawings, Upper Bounds, and Asymptotics

Author: Brodsky Alex
Durocher Stephane
Gethner Ellen
Publication venue: 'Elsevier BV'
Publication date: 28/09/2000
Field of study

Scheinerman and Wilf (1994) assert that `an important open problem in the study of graph embeddings is to determine the rectilinear crossing number of the complete graph K_n.' A rectilinear drawing of K_n is an arrangement of n vertices in the plane, every pair of which is connected by an edge that is a line segment. We assume that no three vertices are collinear, and that no three edges intersect in a point unless that point is an endpoint of all three. The rectilinear crossing number of K_n is the fewest number of edge crossings attainable over all rectilinear drawings of K_n. For each n we construct a rectilinear drawing of K_n that has the fewest number of edge crossings and the best asymptotics known to date. Moreover, we give some alternative infinite families of drawings of K_n with good asymptotics. Finally, we mention some old and new open problems.Comment: 13 Page

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Automatic identification of embedded network rows in large-scale optimization models

Author: Brown Gerald G.
Wright William G.
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/11/1980
Field of study

The solution of a contemporary large-scale linear, integer, or mixed-integer programming problem is often facilitated by the exploitation of intrinsic special structure in the model. This paper deals with the problem of identifying embedded pure network rows within the coefficient matrix of such models and presents two heuristic algorithms for identifying such structure. The problem of identifying the maximum-size embedded pure network is shown to be among the class of NP-hard problems; therefore, the polynomially bounded, efficient algorithms presented here do not guarantee network sets of maximum size. However, upper bounds on the size of the maximum network set are developed and used to evaluate the algorithms. Computational tests with large-scale, real-world models are presented.Office of Naval Research, Code 434, Arlington, VAApproved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

Beyond the One Step Greedy Approach in Reinforcement Learning

Author: Dalal Gal
Efroni Yonathan
Mannor Shie
Scherrer Bruno
Publication venue
Publication date: 10/07/2018
Field of study

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g,

n

-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.Comment: ICML 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

A Tight Upper Bound on the Number of Candidate Patterns

Author: Bussche Jan Van den
Geerts Floris
Goethals Bart
Publication venue
Publication date: 01/01/2001
Field of study

In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans

arXiv.org e-Print Archive

CiteSeerX

Mathematical programming in The Netherlands

Author: Dam van, W.B.
Tilanus C.B.
Publication venue
Publication date: 01/01/1983
Field of study

Repository TU/e

Pure OAI Repository