24 research outputs found
Convex Q Learning in a Stochastic Environment: Extended Version
The paper introduces the first formulation of convex Q-learning for Markov
decision processes with function approximation. The algorithms and theory rest
on a relaxation of a dual of Manne's celebrated linear programming
characterization of optimal control. The main contributions firstly concern
properties of the relaxation, described as a deterministic convex program: we
identify conditions for a bounded solution, and a significant relationship
between the solution to the new convex program, and the solution to standard
Q-learning. The second set of contributions concern algorithm design and
analysis: (i) A direct model-free method for approximating the convex program
for Q-learning shares properties with its ideal. In particular, a bounded
solution is ensured subject to a simple property of the basis functions; (ii)
The proposed algorithms are convergent and new techniques are introduced to
obtain the rate of convergence in a mean-square sense; (iii) The approach can
be generalized to a range of performance criteria, and it is found that
variance can be reduced by considering ``relative'' dynamic programming
equations; (iv) The theory is illustrated with an application to a classical
inventory control problem.Comment: Extended version of "Convex Q-learning in a stochastic environment",
IEEE Conference on Decision and Control, 2023 (to appear
Manifold Representations for Continuous-State Reinforcement Learning
Reinforcement learning (RL) has shown itself to be an effective paradigm for solving optimal control problems with a finite number of states. Generalizing RL techniques to problems with a continuous state space has proven a difficult task. We present an approach to modeling the RL value function using a manifold representation. By explicitly modeling the topology of the value function domain, traditional problems with discontinuities and resolution can be addressed without resorting to complex function approximators. We describe how manifold techniques can be applied to value-function approximation, and present methods for constructing manifold representations in both batch and online settings. We present empirical results demonstrating the effectiveness of our approach
On the Convergence of Bounded Agents
When has an agent converged? Standard models of the reinforcement learning
problem give rise to a straightforward definition of convergence: An agent
converges when its behavior or performance in each environment state stops
changing. However, as we shift the focus of our learning problem from the
environment's state to the agent's state, the concept of an agent's convergence
becomes significantly less clear. In this paper, we propose two complementary
accounts of agent convergence in a framing of the reinforcement learning
problem that centers around bounded agents. The first view says that a bounded
agent has converged when the minimal number of states needed to describe the
agent's future behavior cannot decrease. The second view says that a bounded
agent has converged just when the agent's performance only changes if the
agent's internal state changes. We establish basic properties of these two
definitions, show that they accommodate typical views of convergence in
standard settings, and prove several facts about their nature and relationship.
We take these perspectives, definitions, and analysis to bring clarity to a
central idea of the field