123 research outputs found

    A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming

    Get PDF
    We prove a central limit theorem for a class of additive processes that arise naturally in the theory of finite horizon Markov decision problems. The main theorem generalizes a classic result of Dobrushin (1956) for temporally non-homogeneous Markov chains, and the principal innovation is that here the summands are permitted to depend on both the current state and a bounded number of future states of the chain. We show through several examples that this added flexibility gives one a direct path to asymptotic normality of the optimal total reward of finite horizon Markov decision problems. The same examples also explain why such results are not easily obtained by alternative Markovian techniques such as enlargement of the state space.Comment: 27 pages, 1 figur

    Sporadic Overtaking Optimality in Markov Decision Problems

    Get PDF

    OPTIMALITY CRITERIA FOR DETERMINISTIC DISCRETE-TIME INFINITE HORIZON OPTIMIZATION

    Get PDF
    We consider the problem of selecting an optimality criterion, when total costs diverge, in deterministic infinite horizon optimization over discrete time. Our formulation allows for both discrete and continuous state and action spaces, as well as time-varying, that is, nonstationary, data. The task is to choose a criterion that is neither too overselective, so that no policy is optimal, nor too underselective, so that most policies are optimal. We contrast and compare the following optimality criteria: strong, overtaking, weakly overtaking, efficient, and average. However, our focus is on the optimality criterion of efficiency. (A solution is efficient if it is optimal to each of the states through which it passes.) Under mild regularity conditions, we show that efficient solutions always exist and thus are not overselective. As to underselectivity, we provide weak state reachability conditions which assure that every efficient solution is also average optimal, thus providing a sufficient condition for average optima to exist. Our main result concerns the case where the discounted per-period costs converge to zero, while the discounted total costs diverge to infinity. Under the assumption that we can reach from any feasible state any feasible sequence of states in bounded time, we show that every efficient solution is also overtaking, thus providing a sufficient condition for overtaking optima to exist. 1

    Idempotent structures in optimization

    Get PDF
    Consider the set A = R ∪ {+∞} with the binary operations o1 = max and o2 = + and denote by An the set of vectors v = (v1,...,vn) with entries in A. Let the generalised sum u o1 v of two vectors denote the vector with entries uj o1 vj , and the product a o2 v of an element a ∈ A and a vector v ∈ An denote the vector with the entries a o2 vj . With these operations, the set An provides the simplest example of an idempotent semimodule. The study of idempotent semimodules and their morphisms is the subject of idempotent linear algebra, which has been developing for about 40 years already as a useful tool in a number of problems of discrete optimisation. Idempotent analysis studies infinite dimensional idempotent semimodules and is aimed at the applications to the optimisations problems with general (not necessarily finite) state spaces. We review here the main facts of idempotent analysis and its major areas of applications in optimisation theory, namely in multicriteria optimisation, in turnpike theory and mathematical economics, in the theory of generalised solutions of the Hamilton-Jacobi Bellman (HJB) equation, in the theory of games and controlled Marcov processes, in financial mathematics

    Discrete-time controlled markov processes with average cost criterion: a survey

    Get PDF
    This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation

    Unbeatable Imitation

    Get PDF
    We show that for many classes of symmetric two-player games, the simple decision rule "imitate-the-best" can hardly be beaten by any other decision rule. We provide necessary and sufficient conditions for imitation to be unbeatable and show that it can only be beaten by much in games that are of the rock-scissors-paper variety. Thus, in many interesting examples, like 2x2 games, Cournot duopoly, price competition, rent seeking, public goods games, common pool resource games, minimum effort coordination games, arms race, search, bargaining, etc., imitation cannot be beaten by much even by a very clever opponent

    A Relative Value Iteration Algorithm for Non-degenerate Controlled Diffusions

    Full text link
    The ergodic control problem for a non-degenerate controlled diffusion controlled through its drift is considered under a uniform stability condition that ensures the well-posedness of the associated Hamilton-Jacobi-Bellman (HJB) equation. A nonlinear parabolic evolution equation is then proposed as a continuous time continuous state space analog of White's `relative value iteration' algorithm for solving the ergodic dynamic programming equation for the finite state finite action case. Its convergence to the solution of the HJB equation is established using the theory of monotone dynamical systems and also, alternatively, by using the theory of reverse martingales.Comment: 17 page
    corecore