77 research outputs found

    Improved and generalized upper bounds on the complexity of policy iteration

    Get PDF
    Markov decision processes ; Dynamic Programming ; Analysis of AlgorithmsInternational audienceGiven a Markov Decision Process (MDP) with nn states and a totalnumber mm of actions, we study the number of iterations needed byPolicy Iteration (PI) algorithms to converge to the optimalγ\gamma-discounted policy. We consider two variations of PI: Howard'sPI that changes the actions in all states with a positive advantage,and Simplex-PI that only changes the action in the state with maximaladvantage. We show that Howard's PI terminates after at most O(m1γlog(11γ))O\left(\frac{m}{1-\gamma}\log\left(\frac{1}{1-\gamma}\right)\right)iterations, improving by a factor O(logn)O(\log n) a result by Hansen etal., while Simplex-PI terminates after at most O(nm1γlog(11γ))O\left(\frac{nm}{1-\gamma}\log\left(\frac{1}{1-\gamma}\right)\right)iterations, improving by a factor O(logn)O(\log n) a result by Ye. Undersome structural properties of the MDP, we then consider bounds thatare independent of the discount factor~γ\gamma: quantities ofinterest are bounds τt\tau_t and τr\tau_r---uniform on all states andpolicies---respectively on the \emph{expected time spent in transientstates} and \emph{the inverse of the frequency of visits in recurrentstates} given that the process starts from the uniform distribution.Indeed, we show that Simplex-PI terminates after at most O~(n3m2τtτr)\tilde O\left( n^3 m^2 \tau_t \tau_r \right) iterations. This extends arecent result for deterministic MDPs by Post \& Ye, in which τt1\tau_t\le 1 and τrn\tau_r \le n; in particular it shows that Simplex-PI isstrongly polynomial for a much larger class of MDPs. We explain whysimilar results seem hard to derive for Howard's PI. Finally, underthe additional (restrictive) assumption that the state space ispartitioned in two sets, respectively states that are transient andrecurrent for all policies, we show that both Howard's PI andSimplex-PI terminate after at most O~(m(n2τt+nτr))\tilde O(m(n^2\tau_t+n\tau_r))iterations

    Toward the Rectilinear Crossing Number of KnK_n: New Drawings, Upper Bounds, and Asymptotics

    Get PDF
    Scheinerman and Wilf (1994) assert that `an important open problem in the study of graph embeddings is to determine the rectilinear crossing number of the complete graph K_n.' A rectilinear drawing of K_n is an arrangement of n vertices in the plane, every pair of which is connected by an edge that is a line segment. We assume that no three vertices are collinear, and that no three edges intersect in a point unless that point is an endpoint of all three. The rectilinear crossing number of K_n is the fewest number of edge crossings attainable over all rectilinear drawings of K_n. For each n we construct a rectilinear drawing of K_n that has the fewest number of edge crossings and the best asymptotics known to date. Moreover, we give some alternative infinite families of drawings of K_n with good asymptotics. Finally, we mention some old and new open problems.Comment: 13 Page

    Automatic identification of embedded network rows in large-scale optimization models

    Get PDF
    The solution of a contemporary large-scale linear, integer, or mixed-integer programming problem is often facilitated by the exploitation of intrinsic special structure in the model. This paper deals with the problem of identifying embedded pure network rows within the coefficient matrix of such models and presents two heuristic algorithms for identifying such structure. The problem of identifying the maximum-size embedded pure network is shown to be among the class of NP-hard problems; therefore, the polynomially bounded, efficient algorithms presented here do not guarantee network sets of maximum size. However, upper bounds on the size of the maximum network set are developed and used to evaluate the algorithms. Computational tests with large-scale, real-world models are presented.Office of Naval Research, Code 434, Arlington, VAApproved for public release; distribution is unlimited

    Beyond the One Step Greedy Approach in Reinforcement Learning

    Get PDF
    The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, nn-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.Comment: ICML 201

    A Tight Upper Bound on the Number of Candidate Patterns

    Full text link
    In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans
    corecore