Search CORE

73,472 research outputs found

On adaptive control of Markov processes

Author: Mandl Petr
Romera Ayllón M. Rosario
Publication venue: Institute of Information Theory and Automation AS CR
Publication date: 01/01/1987
Field of study

Institute of Mathematics AS CR, v. v. i.

Almost Sure Stabilization for Adaptive Controls of Regime-switching LQ Systems with A Hidden Markov Chain

Author: Bercu Bernard
Dufour Francois
Yin G. George
Publication venue
Publication date: 09/07/2008
Field of study

This work is devoted to the almost sure stabilization of adaptive control systems that involve an unknown Markov chain. The control system displays continuous dynamics represented by differential equations and discrete events given by a hidden Markov chain. Different from previous work on stabilization of adaptive controlled systems with a hidden Markov chain, where average criteria were considered, this work focuses on the almost sure stabilization or sample path stabilization of the underlying processes. Under simple conditions, it is shown that as long as the feedback controls have linear growth in the continuous component, the resulting process is regular. Moreover, by appropriate choice of the Lyapunov functions, it is shown that the adaptive system is stabilizable almost surely. As a by-product, it is also established that the controlled process is positive recurrent

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Oskar Bordeaux

"Illusion of control" in Minority and Parrondo Games

Author: Adjari
Andrecut
Araujo
Challet
Challet
Challet
Challet
D'Hulst
D. Sornette
Dinis
Duffy
Harmer
Harmer
Hart
Hart
J. B. Satinover
Kay
Kinderlehrer
Langer
Li
Man
Maslov
Parrondo
Prost
Satinover
Smoluchowski
Sysi-Aho
Sysi-Aho
Taylor
Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/04/2007
Field of study

Human beings like to believe they are in control of their destiny. This ubiquitous trait seems to increase motivation and persistence, and is probably evolutionarily adaptive. But how good really is our ability to control? How successful is our track record in these areas? There is little understanding of when and under what circumstances we may over-estimate or even lose our ability to control and optimize outcomes, especially when they are the result of aggregations of individual optimization processes. Here, we demonstrate analytically using the theory of Markov Chains and by numerical simulations in two classes of games, the Minority game and the Parrondo Games, that agents who optimize their strategy based on past information actually perform worse than non-optimizing agents. In other words, low-entropy (more informative) strategies under-perform high-entropy (or random) strategies. This provides a precise definition of the "illusion of control" in set-ups a priori defined to emphasize the importance of optimization.Comment: 17 pages, four figures, 1 tabl

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES

Author: Thomas Abraham
Publication venue
Publication date: 01/01/2009
Field of study

We propose various computational schemes for solving Partially Observable Markov Decision Processes with the finite stage additive cost and infinite horizon discounted cost criterion. Error bounds for the corresponding algorithms are given and it is further shown that at the expense of more computational effort the Partially Observable Markov Decision Problem (POMDP) can be solved as closely to the optimal as desired. It is well known that a sufficient statistic for taking the best action at any time for the POMDP is the aposteriori probability distribution on the underlying states, given all the past history, and that this can be updated recursively. We prove that the finite stage optimal costs as well as the optimal cost for the infinite horizon discounted cost problem are both Lipschitz continuous (with domain the unit simplex of probability distributions over the underlying states) and gives bounds for the Lipschitz constant. We use these bounds to provide error bounds for computational algorithms for solving POMDPs. We extend the almost sure convergence result of a very general stochastic approximation algorithm to the case when the underlying Markov process exhibits periodicity. This result is used to extend the proof of convergence of Temporal Difference (TD) reinforcement learning schemes with linear function approximation for Markov Cost processes in order to estimate the cost to go function for the discounted cost criterion, and the differential cost function for the average cost criterion, respectively. Adaptive control of Markov Decision Problems (MDPs) is a problem in which a full knowledge of the system parameters, namely transition probabilities as well as the distribution of the immediate costs, are not available apriori. We give direct adaptive control schemes for infinite horizon discounted cost and average cost MDPs. Approximate Policy Iteration using on-line TD schemes for policy evaluation is detailed for the discounted cost and average cost criteria. Possible extensions of direct adaptive control schemes to the POMDP framework are discussed. Auxiliary results relevant to the core results of the dissertation are stated and proved in the appendices. In particular an efficient discretization scheme for the finite dimensional unit simplex is given. Some general error bounds for MDPs are also given. Also TD schemes for learning in Stochastic Shortest Path problems (SSP) are discussed

Digital Repository at the University of Maryland

Adaptive Continuous time Markov Chain Approximation Model to General Jump-Diffusions

Author: Cerrato Mario
Lo Chia Chun
Skindilias Konstantinos
Publication venue: Centre for EMEA Banking, Finance and Economics, London Metropolitan University
Publication date: 01/01/2011
Field of study

We propose a non-equidistant Q rate matrix formula and an adaptive numerical algorithm for a continuous time Markov chain to approximate jump-diffusions with affine or non-affine functional specifications. Our approach also accommodates state-dependent jump intensity and jump distribution, a flexibility that is very hard to achieve with other numerical methods. The Kologorov-Smirnov test shows that the proposed Markov chain transition density converges to the one given by the likelihood expansion formula as in Ait-Sahalia (2008). We provide numerical examples for European stock option pricing in Black and Scholes (1973), Merton (1976) and Kou (2002)

London Met Repository

Parameter estimation in stochastic systems: some recent results and applications

Author: Borkar Vivek S.
Publication venue: North-Holland
Publication date: 01/01/1982
Field of study

Some recent work on the characterization of almost sure limit sets for maximum likelihood estimates for stochastic systems is reviewed. Applications to allied topics such as input selection for identification, model selection, self-tuning etc. are briefly discussed

University of Twente Research Information