3 research outputs found
Linear Programming Formulation of Long Run Average Optimal Control Problem
We introduce and study the infinite dimensional linear programming problem
which along with its dual allows one to characterize the optimal value of the
deterministic long-run average optimal control problem in the general case when
the latter may depend on the initial conditions of the system
Convex Q-Learning, Part 1: Deterministic Optimal Control
It is well known that the extension of Watkins' algorithm to general function
approximation settings is challenging: does the projected Bellman equation have
a solution? If so, is the solution useful in the sense of generating a good
policy? And, if the preceding questions are answered in the affirmative, is the
algorithm consistent? These questions are unanswered even in the special case
of Q-function approximations that are linear in the parameter. The challenge
seems paradoxical, given the long history of convex analytic approaches to
dynamic programming. The paper begins with a brief survey of linear programming
approaches to optimal control, leading to a particular over parameterization
that lends itself to applications in reinforcement learning. The main
conclusions are summarized as follows:
(i) The new class of convex Q-learning algorithms is introduced based on the
convex relaxation of the Bellman equation. Convergence is established under
general conditions, including a linear function approximation for the
Q-function.
(ii) A batch implementation appears similar to the famed DQN algorithm (one
engine behind AlphaZero). It is shown that in fact the algorithms are very
different: while convex Q-learning solves a convex program that approximates
the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm
with function approximation: (a) it is shown that both seek solutions to the
same fixed point equation, and (b) the ODE approximations for the two
algorithms coincide, and little is known about the stability of this ODE.
These results are obtained for deterministic nonlinear systems with total
cost criterion. Many extensions are proposed, including kernel implementation,
and extension to MDP models.Comment: This pre-print is written in a tutorial style so it is accessible to
new-comers. It will be a part of a handout for upcoming short courses on RL.
A more compact version suitable for journal submission is in preparatio
LP Based Upper and Lower Bounds for Ces\'aro and Abel Limits of the Optimal Values in Problems of Control of Stochastic Discrete Time Systems
In this paper, we study asymptotic properties of problems of control of
stochastic discrete time systems with time averaging and time discounting
optimality criteria, and we establish that the Ces\'aro and Abel limits of the
optimal values in such problems can be evaluated with the help of a certain
infinite-dimensional linear programming problem and its dual