This paper aims at the algorithmic/theoretical core of reinforcement learning
(RL) by introducing the novel class of proximal Bellman mappings. These
mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit
from the rich approximation properties and inner product of RKHSs, they are
shown to belong to the powerful Hilbertian family of (firmly) nonexpansive
mappings, regardless of the values of their discount factors, and possess ample
degrees of design freedom to even reproduce attributes of the classical Bellman
mappings and to pave the way for novel RL designs. An approximate
policy-iteration scheme is built on the proposed class of mappings to solve the
problem of selecting online, at every time instance, the "optimal" exponent p
in a p-norm loss to combat outliers in linear adaptive filtering, without
training data and any knowledge on the statistical properties of the outliers.
Numerical tests on synthetic data showcase the superior performance of the
proposed framework over several non-RL and kernel-based RL schemes.Comment: arXiv admin note: text overlap with arXiv:2210.1175