73,472 research outputs found
Almost Sure Stabilization for Adaptive Controls of Regime-switching LQ Systems with A Hidden Markov Chain
This work is devoted to the almost sure stabilization of adaptive control
systems that involve an unknown Markov chain. The control system displays
continuous dynamics represented by differential equations and discrete events
given by a hidden Markov chain. Different from previous work on stabilization
of adaptive controlled systems with a hidden Markov chain, where average
criteria were considered, this work focuses on the almost sure stabilization or
sample path stabilization of the underlying processes. Under simple conditions,
it is shown that as long as the feedback controls have linear growth in the
continuous component, the resulting process is regular. Moreover, by
appropriate choice of the Lyapunov functions, it is shown that the adaptive
system is stabilizable almost surely. As a by-product, it is also established
that the controlled process is positive recurrent
"Illusion of control" in Minority and Parrondo Games
Human beings like to believe they are in control of their destiny. This
ubiquitous trait seems to increase motivation and persistence, and is probably
evolutionarily adaptive. But how good really is our ability to control? How
successful is our track record in these areas? There is little understanding of
when and under what circumstances we may over-estimate or even lose our ability
to control and optimize outcomes, especially when they are the result of
aggregations of individual optimization processes. Here, we demonstrate
analytically using the theory of Markov Chains and by numerical simulations in
two classes of games, the Minority game and the Parrondo Games, that agents who
optimize their strategy based on past information actually perform worse than
non-optimizing agents. In other words, low-entropy (more informative)
strategies under-perform high-entropy (or random) strategies. This provides a
precise definition of the "illusion of control" in set-ups a priori defined to
emphasize the importance of optimization.Comment: 17 pages, four figures, 1 tabl
LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES
We propose various computational schemes for solving Partially Observable
Markov Decision Processes with the finite stage additive cost and infinite
horizon discounted cost criterion. Error bounds for the corresponding algorithms
are given and it is further shown that at the expense of more computational
effort the Partially Observable Markov Decision Problem (POMDP) can be solved
as closely to the optimal as desired.
It is well known that a sufficient statistic for taking the best action at any time for
the POMDP is the aposteriori probability distribution on the underlying states, given
all the past history, and that this can be updated recursively. We prove that the finite
stage optimal costs as well as the optimal cost for the infinite horizon discounted
cost problem are both Lipschitz continuous (with domain the unit simplex of probability
distributions over the underlying states) and gives bounds for the Lipschitz constant.
We use these bounds to provide error bounds for computational algorithms for solving
POMDPs.
We extend the almost sure convergence result of a very general stochastic approximation
algorithm to the case when the underlying Markov process exhibits periodicity. This result
is used to extend the proof of convergence of Temporal Difference (TD) reinforcement learning
schemes with linear function approximation for Markov Cost processes in order to estimate the
cost to go function for the discounted cost criterion, and the differential cost function for the
average cost criterion, respectively.
Adaptive control of Markov Decision Problems (MDPs) is a problem in which a full knowledge
of the system parameters, namely transition probabilities as well as the distribution of the
immediate costs, are not available apriori. We give direct adaptive control schemes for
infinite horizon discounted cost and average cost MDPs. Approximate Policy Iteration
using on-line TD schemes for policy evaluation is detailed for the discounted cost and
average cost criteria.
Possible extensions of direct adaptive control schemes to the POMDP framework are
discussed.
Auxiliary results relevant to the core results of the dissertation are stated
and proved in the appendices. In particular an efficient discretization scheme
for the finite dimensional unit simplex is given. Some general error bounds for
MDPs are also given. Also TD schemes for learning in Stochastic Shortest Path
problems (SSP) are discussed
Adaptive Continuous time Markov Chain Approximation Model to General Jump-Diffusions
We propose a non-equidistant Q rate matrix formula and an adaptive numerical algorithm for a continuous time Markov chain to approximate jump-diffusions with affine or non-affine functional specifications. Our approach also accommodates state-dependent jump intensity and jump distribution, a flexibility that is very hard to achieve with other numerical methods. The Kologorov-Smirnov test shows that the proposed Markov chain transition density converges to the one given by the likelihood expansion formula as in Ait-Sahalia (2008). We provide numerical examples for European stock option pricing in Black and Scholes (1973), Merton (1976) and Kou
(2002)
Parameter estimation in stochastic systems: some recent results and applications
Some recent work on the characterization of almost sure limit sets for maximum likelihood estimates for stochastic systems is reviewed. Applications to allied topics such as input selection for identification, model selection, self-tuning etc. are briefly discussed
- …