77 research outputs found
Improved and generalized upper bounds on the complexity of policy iteration
Markov decision processes ; Dynamic Programming ; Analysis of AlgorithmsInternational audienceGiven a Markov Decision Process (MDP) with states and a totalnumber of actions, we study the number of iterations needed byPolicy Iteration (PI) algorithms to converge to the optimal-discounted policy. We consider two variations of PI: Howard'sPI that changes the actions in all states with a positive advantage,and Simplex-PI that only changes the action in the state with maximaladvantage. We show that Howard's PI terminates after at most iterations, improving by a factor a result by Hansen etal., while Simplex-PI terminates after at most iterations, improving by a factor a result by Ye. Undersome structural properties of the MDP, we then consider bounds thatare independent of the discount factor~: quantities ofinterest are bounds and ---uniform on all states andpolicies---respectively on the \emph{expected time spent in transientstates} and \emph{the inverse of the frequency of visits in recurrentstates} given that the process starts from the uniform distribution.Indeed, we show that Simplex-PI terminates after at most iterations. This extends arecent result for deterministic MDPs by Post \& Ye, in which and ; in particular it shows that Simplex-PI isstrongly polynomial for a much larger class of MDPs. We explain whysimilar results seem hard to derive for Howard's PI. Finally, underthe additional (restrictive) assumption that the state space ispartitioned in two sets, respectively states that are transient andrecurrent for all policies, we show that both Howard's PI andSimplex-PI terminate after at most iterations
Toward the Rectilinear Crossing Number of : New Drawings, Upper Bounds, and Asymptotics
Scheinerman and Wilf (1994) assert that `an important open problem in the
study of graph embeddings is to determine the rectilinear crossing number of
the complete graph K_n.' A rectilinear drawing of K_n is an arrangement of n
vertices in the plane, every pair of which is connected by an edge that is a
line segment. We assume that no three vertices are collinear, and that no three
edges intersect in a point unless that point is an endpoint of all three. The
rectilinear crossing number of K_n is the fewest number of edge crossings
attainable over all rectilinear drawings of K_n.
For each n we construct a rectilinear drawing of K_n that has the fewest
number of edge crossings and the best asymptotics known to date. Moreover, we
give some alternative infinite families of drawings of K_n with good
asymptotics. Finally, we mention some old and new open problems.Comment: 13 Page
Automatic identification of embedded network rows in large-scale optimization models
The solution of a contemporary large-scale linear, integer, or mixed-integer programming problem is often facilitated by the exploitation of intrinsic special structure in the model. This paper deals with the problem of identifying embedded pure network rows within the coefficient matrix of such models and presents two heuristic algorithms for identifying such structure. The problem of identifying the maximum-size embedded pure network is shown to be among the class of NP-hard problems; therefore, the polynomially bounded, efficient algorithms presented here do not guarantee network sets of maximum size. However, upper bounds on the size of the maximum network set are developed and used to evaluate the algorithms. Computational tests with large-scale, real-world models are presented.Office of Naval Research, Code 434, Arlington, VAApproved for public release; distribution is unlimited
Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvement
and policy evaluation. Implementations of this algorithm with several variants
of the latter evaluation stage, e.g, -step and trace-based returns, have
been analyzed in previous works. However, the case of multiple-step lookahead
policy improvement, despite the recent increase in empirical evidence of its
strength, has to our knowledge not been carefully analyzed yet. In this work,
we introduce the first such analysis. Namely, we formulate variants of
multiple-step policy improvement, derive new algorithms using these definitions
and prove their convergence. Moreover, we show that recent prominent
Reinforcement Learning algorithms are, in fact, instances of our framework. We
thus shed light on their empirical success and give a recipe for deriving new
algorithms for future study.Comment: ICML 201
A Tight Upper Bound on the Number of Candidate Patterns
In the context of mining for frequent patterns using the standard levelwise
algorithm, the following question arises: given the current level and the
current set of frequent patterns, what is the maximal number of candidate
patterns that can be generated on the next level? We answer this question by
providing a tight upper bound, derived from a combinatorial result from the
sixties by Kruskal and Katona. Our result is useful to reduce the number of
database scans
- …