Search CORE

123 research outputs found

A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming

Author: Arlotto Alessandro
Steele J. Michael
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 06/12/2015
Field of study

We prove a central limit theorem for a class of additive processes that arise naturally in the theory of finite horizon Markov decision problems. The main theorem generalizes a classic result of Dobrushin (1956) for temporally non-homogeneous Markov chains, and the principal innovation is that here the summands are permitted to depend on both the current state and a bounded number of future states of the chain. We show through several examples that this added flexibility gives one a direct path to asymptotic normality of the optimal total reward of finite horizon Markov decision problems. The same examples also explain why such results are not easily obtained by alternative Markovian techniques such as enlargement of the state space.Comment: 27 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Sporadic Overtaking Optimality in Markov Decision Problems

Author: Flesch Janos
Predtetchinski Arkadi
Solan E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2017
Field of study

Maastricht University Research Portal

OPTIMALITY CRITERIA FOR DETERMINISTIC DISCRETE-TIME INFINITE HORIZON OPTIMIZATION

Author: Irwin E. Schochetman
L. Smith
Robert
Publication venue
Publication date
Field of study

We consider the problem of selecting an optimality criterion, when total costs diverge, in deterministic infinite horizon optimization over discrete time. Our formulation allows for both discrete and continuous state and action spaces, as well as time-varying, that is, nonstationary, data. The task is to choose a criterion that is neither too overselective, so that no policy is optimal, nor too underselective, so that most policies are optimal. We contrast and compare the following optimality criteria: strong, overtaking, weakly overtaking, efficient, and average. However, our focus is on the optimality criterion of efficiency. (A solution is efficient if it is optimal to each of the states through which it passes.) Under mild regularity conditions, we show that efficient solutions always exist and thus are not overselective. As to underselectivity, we provide weak state reachability conditions which assure that every efficient solution is also average optimal, thus providing a sufficient condition for average optima to exist. Our main result concerns the case where the discounted per-period costs converge to zero, while the discounted total costs diverge to infinity. Under the assumption that we can reach from any feasible state any feasible sequence of states in bounded time, we show that every efficient solution is also overtaking, thus providing a sufficient condition for overtaking optima to exist. 1

CiteSeerX

Idempotent structures in optimization

Author: Kolokoltsov V. N. (Vasiliĭ Nikitich)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

Consider the set A = R ∪ {+∞} with the binary operations o1 = max and o2 = + and denote by An the set of vectors v = (v1,...,vn) with entries in A. Let the generalised sum u o1 v of two vectors denote the vector with entries uj o1 vj , and the product a o2 v of an element a ∈ A and a vector v ∈ An denote the vector with the entries a o2 vj . With these operations, the set An provides the simplest example of an idempotent semimodule. The study of idempotent semimodules and their morphisms is the subject of idempotent linear algebra, which has been developing for about 40 years already as a useful tool in a number of problems of discrete optimisation. Idempotent analysis studies infinite dimensional idempotent semimodules and is aimed at the applications to the optimisations problems with general (not necessarily finite) state spaces. We review here the main facts of idempotent analysis and its major areas of applications in optimisation theory, namely in multicriteria optimisation, in turnpike theory and mathematical economics, in the theory of generalised solutions of the Hamilton-Jacobi Bellman (HJB) equation, in the theory of games and controlled Marcov processes, in financial mathematics

Warwick Research Archives Portal Repository

Discrete-time controlled markov processes with average cost criterion: a survey

Author: Arapostathis Aristotle
Borkar Vivek S.
Fernandez-Gaucherand Emmanuel
Ghosh Mrinal K.
Marcus Steven I.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/03/1993
Field of study

This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation

Unbeatable Imitation

Author: Chen Bo
Drew Fudenberg
Er Matros
Jel-classifications C
John Stachurski
Jörg Oechssler
Peter Duersch
We Thank Carlos Alós-ferrer
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

We show that for many classes of symmetric two-player games, the simple decision rule "imitate-the-best" can hardly be beaten by any other decision rule. We provide necessary and sufficient conditions for imitation to be unbeatable and show that it can only be beaten by much in games that are of the rock-scissors-paper variety. Thus, in many interesting examples, like 2x2 games, Cournot duopoly, price competition, rent seeking, public goods games, common pool resource games, minimum effort coordination games, arms race, search, bargaining, etc., imitation cannot be beaten by much even by a very clever opponent

arXiv.org e-Print Archive

Munich RePEc Personal Archive

CiteSeerX

Heidelberger Dokumentenserver

A Relative Value Iteration Algorithm for Non-degenerate Controlled Diffusions

Author: Ari Arapostathis
Stannat W.
Vivek S. Borkar
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 06/10/2011
Field of study

The ergodic control problem for a non-degenerate controlled diffusion controlled through its drift is considered under a uniform stability condition that ensures the well-posedness of the associated Hamilton-Jacobi-Bellman (HJB) equation. A nonlinear parabolic evolution equation is then proposed as a continuous time continuous state space analog of White's `relative value iteration' algorithm for solving the ergodic dynamic programming equation for the finite state finite action case. Its convergence to the solution of the HJB equation is established using the theory of monotone dynamical systems and also, alternatively, by using the theory of reverse martingales.Comment: 17 page

arXiv.org e-Print Archive

Crossref

Dspace at IIT Bombay

Recommended from our members

Game-Theoretic Safety Assurance for Human-Centered Robotic Systems

Author: Fernandez Fisac Jaime
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence

eScholarship - University of California