73,472 research outputs found

    On adaptive control of Markov processes

    Get PDF

    Almost Sure Stabilization for Adaptive Controls of Regime-switching LQ Systems with A Hidden Markov Chain

    Full text link
    This work is devoted to the almost sure stabilization of adaptive control systems that involve an unknown Markov chain. The control system displays continuous dynamics represented by differential equations and discrete events given by a hidden Markov chain. Different from previous work on stabilization of adaptive controlled systems with a hidden Markov chain, where average criteria were considered, this work focuses on the almost sure stabilization or sample path stabilization of the underlying processes. Under simple conditions, it is shown that as long as the feedback controls have linear growth in the continuous component, the resulting process is regular. Moreover, by appropriate choice of the Lyapunov functions, it is shown that the adaptive system is stabilizable almost surely. As a by-product, it is also established that the controlled process is positive recurrent

    "Illusion of control" in Minority and Parrondo Games

    Full text link
    Human beings like to believe they are in control of their destiny. This ubiquitous trait seems to increase motivation and persistence, and is probably evolutionarily adaptive. But how good really is our ability to control? How successful is our track record in these areas? There is little understanding of when and under what circumstances we may over-estimate or even lose our ability to control and optimize outcomes, especially when they are the result of aggregations of individual optimization processes. Here, we demonstrate analytically using the theory of Markov Chains and by numerical simulations in two classes of games, the Minority game and the Parrondo Games, that agents who optimize their strategy based on past information actually perform worse than non-optimizing agents. In other words, low-entropy (more informative) strategies under-perform high-entropy (or random) strategies. This provides a precise definition of the "illusion of control" in set-ups a priori defined to emphasize the importance of optimization.Comment: 17 pages, four figures, 1 tabl

    LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES

    Get PDF
    We propose various computational schemes for solving Partially Observable Markov Decision Processes with the finite stage additive cost and infinite horizon discounted cost criterion. Error bounds for the corresponding algorithms are given and it is further shown that at the expense of more computational effort the Partially Observable Markov Decision Problem (POMDP) can be solved as closely to the optimal as desired. It is well known that a sufficient statistic for taking the best action at any time for the POMDP is the aposteriori probability distribution on the underlying states, given all the past history, and that this can be updated recursively. We prove that the finite stage optimal costs as well as the optimal cost for the infinite horizon discounted cost problem are both Lipschitz continuous (with domain the unit simplex of probability distributions over the underlying states) and gives bounds for the Lipschitz constant. We use these bounds to provide error bounds for computational algorithms for solving POMDPs. We extend the almost sure convergence result of a very general stochastic approximation algorithm to the case when the underlying Markov process exhibits periodicity. This result is used to extend the proof of convergence of Temporal Difference (TD) reinforcement learning schemes with linear function approximation for Markov Cost processes in order to estimate the cost to go function for the discounted cost criterion, and the differential cost function for the average cost criterion, respectively. Adaptive control of Markov Decision Problems (MDPs) is a problem in which a full knowledge of the system parameters, namely transition probabilities as well as the distribution of the immediate costs, are not available apriori. We give direct adaptive control schemes for infinite horizon discounted cost and average cost MDPs. Approximate Policy Iteration using on-line TD schemes for policy evaluation is detailed for the discounted cost and average cost criteria. Possible extensions of direct adaptive control schemes to the POMDP framework are discussed. Auxiliary results relevant to the core results of the dissertation are stated and proved in the appendices. In particular an efficient discretization scheme for the finite dimensional unit simplex is given. Some general error bounds for MDPs are also given. Also TD schemes for learning in Stochastic Shortest Path problems (SSP) are discussed

    Adaptive Continuous time Markov Chain Approximation Model to General Jump-Diffusions

    Get PDF
    We propose a non-equidistant Q rate matrix formula and an adaptive numerical algorithm for a continuous time Markov chain to approximate jump-diffusions with affine or non-affine functional specifications. Our approach also accommodates state-dependent jump intensity and jump distribution, a flexibility that is very hard to achieve with other numerical methods. The Kologorov-Smirnov test shows that the proposed Markov chain transition density converges to the one given by the likelihood expansion formula as in Ait-Sahalia (2008). We provide numerical examples for European stock option pricing in Black and Scholes (1973), Merton (1976) and Kou (2002)

    Parameter estimation in stochastic systems: some recent results and applications

    Get PDF
    Some recent work on the characterization of almost sure limit sets for maximum likelihood estimates for stochastic systems is reviewed. Applications to allied topics such as input selection for identification, model selection, self-tuning etc. are briefly discussed
    • …
    corecore