36 research outputs found
Max-Plus Matching Pursuit for Deterministic Markov Decision Processes
We consider deterministic Markov decision processes (MDPs) and apply max-plus
algebra tools to approximate the value iteration algorithm by a
smaller-dimensional iteration based on a representation on dictionaries of
value functions. The setup naturally leads to novel theoretical results which
are simply formulated due to the max-plus algebra structure. For example, when
considering a fixed (non adaptive) finite basis, the computational complexity
of approximating the optimal value function is not directly related to the
number of states, but to notions of covering numbers of the state space. In
order to break the curse of dimensionality in factored state-spaces, we
consider adaptive basis that can adapt to particular problems leading to an
algorithm similar to matching pursuit from signal processing. They currently
come with no theoretical guarantees but work empirically well on simple
deterministic MDPs derived from low-dimensional continuous control problems. We
focus primarily on deterministic MDPs but note that the framework can be
applied to all MDPs by considering measure-based formulations
Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning
We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory. Additionally, we specialize the result to asynchronou