123 research outputs found
A central limit theorem for temporally non-homogenous Markov chains with applications to dynamic programming
We prove a central limit theorem for a class of additive processes that arise
naturally in the theory of finite horizon Markov decision problems. The main
theorem generalizes a classic result of Dobrushin (1956) for temporally
non-homogeneous Markov chains, and the principal innovation is that here the
summands are permitted to depend on both the current state and a bounded number
of future states of the chain. We show through several examples that this added
flexibility gives one a direct path to asymptotic normality of the optimal
total reward of finite horizon Markov decision problems. The same examples also
explain why such results are not easily obtained by alternative Markovian
techniques such as enlargement of the state space.Comment: 27 pages, 1 figur
OPTIMALITY CRITERIA FOR DETERMINISTIC DISCRETE-TIME INFINITE HORIZON OPTIMIZATION
We consider the problem of selecting an optimality criterion, when total costs diverge, in deterministic infinite horizon optimization over discrete time. Our formulation allows for both discrete and continuous state and action spaces, as well as time-varying, that is, nonstationary, data. The task is to choose a criterion that is neither too overselective, so that no policy is optimal, nor too underselective, so that most policies are optimal. We contrast and compare the following optimality criteria: strong, overtaking, weakly overtaking, efficient, and average. However, our focus is on the optimality criterion of efficiency. (A solution is efficient if it is optimal to each of the states through which it passes.) Under mild regularity conditions, we show that efficient solutions always exist and thus are not overselective. As to underselectivity, we provide weak state reachability conditions which assure that every efficient solution is also average optimal, thus providing a sufficient condition for average optima to exist. Our main result concerns the case where the discounted per-period costs converge to zero, while the discounted total costs diverge to infinity. Under the assumption that we can reach from any feasible state any feasible sequence of states in bounded time, we show that every efficient solution is also overtaking, thus providing a sufficient condition for overtaking optima to exist. 1
Idempotent structures in optimization
Consider the set A = R ∪ {+∞} with the binary operations o1 = max
and o2 = + and denote by An the set of vectors v = (v1,...,vn) with entries
in A. Let the generalised sum u o1 v of two vectors denote the vector with
entries uj o1 vj , and the product a o2 v of an element a ∈ A and a vector
v ∈ An denote the vector with the entries a o2 vj . With these operations,
the set An provides the simplest example of an idempotent semimodule.
The study of idempotent semimodules and their morphisms is the subject
of idempotent linear algebra, which has been developing for about
40 years already as a useful tool in a number of problems of discrete optimisation.
Idempotent analysis studies infinite dimensional idempotent
semimodules and is aimed at the applications to the optimisations problems
with general (not necessarily finite) state spaces. We review here
the main facts of idempotent analysis and its major areas of applications
in optimisation theory, namely in multicriteria optimisation, in turnpike
theory and mathematical economics, in the theory of generalised solutions
of the Hamilton-Jacobi Bellman (HJB) equation, in the theory of games
and controlled Marcov processes, in financial mathematics
Discrete-time controlled markov processes with average cost criterion: a survey
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation
Unbeatable Imitation
We show that for many classes of symmetric two-player games, the simple
decision rule "imitate-the-best" can hardly be beaten by any other decision
rule. We provide necessary and sufficient conditions for imitation to be
unbeatable and show that it can only be beaten by much in games that are of the
rock-scissors-paper variety. Thus, in many interesting examples, like 2x2
games, Cournot duopoly, price competition, rent seeking, public goods games,
common pool resource games, minimum effort coordination games, arms race,
search, bargaining, etc., imitation cannot be beaten by much even by a very
clever opponent
A Relative Value Iteration Algorithm for Non-degenerate Controlled Diffusions
The ergodic control problem for a non-degenerate controlled diffusion
controlled through its drift is considered under a uniform stability condition
that ensures the well-posedness of the associated Hamilton-Jacobi-Bellman (HJB)
equation. A nonlinear parabolic evolution equation is then proposed as a
continuous time continuous state space analog of White's `relative value
iteration' algorithm for solving the ergodic dynamic programming equation for
the finite state finite action case. Its convergence to the solution of the HJB
equation is established using the theory of monotone dynamical systems and
also, alternatively, by using the theory of reverse martingales.Comment: 17 page
Recommended from our members
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems
In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence
- …