Linear Bandits with Memory: from Rotting to Rising

Cesa-Bianchi, Nicolò; Clerici, Giulia; Laforgue, Pierre

Linear Bandits with Memory: from Rotting to Rising

Authors: Nicolò Cesa-Bianchi
Giulia Clerici
Pierre Laforgue
Publication date: 16 February 2023
Publisher

Abstract

Nonstationary phenomena, such as satiation effects in recommendation, are a common feature of sequential decision-making problems. While these phenomena have been mostly studied in the framework of bandits with finitely many arms, in many practically relevant cases linear bandits provide a more effective modeling choice. In this work, we introduce a general framework for the study of nonstationary linear bandits, where current rewards are influenced by the learner's past actions in a fixed-size window. In particular, our model includes stationary linear bandits as a special case. After showing that the best sequence of actions is NP-hard to compute in our model, we focus on cyclic policies and prove a regret bound for a variant of the OFUL algorithm that balances approximation and estimation errors. Our theoretical findings are supported by experiments (which also include misspecified settings) where our algorithm is seen to perform well against natural baselines

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.08345

Last time updated on 06/03/2023