316 research outputs found
Q-Learning for Continuous State and Action MDPs under Average Cost Criteria
For infinite-horizon average-cost criterion problems, we present several
approximation and reinforcement learning results for Markov Decision Processes
with standard Borel spaces. Toward this end, (i) we first provide a
discretization based approximation method for fully observed Markov Decision
Processes (MDPs) with continuous spaces under average cost criteria, and we
provide error bounds for the approximations when the dynamics are only weakly
continuous under certain ergodicity assumptions. In particular, we relax the
total variation condition given in prior work to weak continuity as well as
Wasserstein continuity conditions. (ii) We provide synchronous and asynchronous
Q-learning algorithms for continuous spaces via quantization, and establish
their convergence. (iii) We show that the convergence is to the optimal Q
values of the finite approximate models constructed via quantization. Our
Q-learning convergence results and their convergence to near optimality are new
for continuous spaces, and the proof method is new even for finite spaces, to
our knowledge.Comment: 3 figure
An Optimal Transmission Strategy for Kalman Filtering over Packet Dropping Links with Imperfect Acknowledgements
This paper presents a novel design methodology for optimal transmission
policies at a smart sensor to remotely estimate the state of a stable linear
stochastic dynamical system. The sensor makes measurements of the process and
forms estimates of the state using a local Kalman filter. The sensor transmits
quantized information over a packet dropping link to the remote receiver. The
receiver sends packet receipt acknowledgments back to the sensor via an
erroneous feedback communication channel which is itself packet dropping. The
key novelty of this formulation is that the smart sensor decides, at each
discrete time instant, whether to transmit a quantized version of either its
local state estimate or its local innovation. The objective is to design
optimal transmission policies in order to minimize a long term average cost
function as a convex combination of the receiver's expected estimation error
covariance and the energy needed to transmit the packets. The optimal
transmission policy is obtained by the use of dynamic programming techniques.
Using the concept of submodularity, the optimality of a threshold policy in the
case of scalar systems with perfect packet receipt acknowledgments is proved.
Suboptimal solutions and their structural results are also discussed. Numerical
results are presented illustrating the performance of the optimal and
suboptimal transmission policies.Comment: Conditionally accepted in IEEE Transactions on Control of Network
System
Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity
Reinforcement learning algorithms often require finiteness of state and
action spaces in Markov decision processes (MDPs) and various efforts have been
made in the literature towards the applicability of such algorithms for
continuous state and action spaces. In this paper, we show that under very mild
regularity conditions (in particular, involving only weak continuity of the
transition kernel of an MDP), Q-learning for standard Borel MDPs via
quantization of states and actions converge to a limit, and furthermore this
limit satisfies an optimality equation which leads to near optimality with
either explicit performance bounds or which are guaranteed to be asymptotically
optimal. Our approach builds on (i) viewing quantization as a measurement
kernel and thus a quantized MDP as a POMDP, (ii) utilizing near optimality and
convergence results of Q-learning for POMDPs, and (iii) finally,
near-optimality of finite state model approximations for MDPs with weakly
continuous kernels which we show to correspond to the fixed point of the
constructed POMDP. Thus, our paper presents a very general convergence and
approximation result for the applicability of Q-learning for continuous MDPs
- …