Autoregressive Bandits

Bacchiocchi, Francesco; Gatti, Nicola; Genalti, Gianmarco; Maran, Davide; Metelli, Alberto Maria; Mussi, Marco; Restelli, Marcello

Autoregressive Bandits

Authors: Francesco Bacchiocchi
Nicola Gatti
Gianmarco Genalti
Davide Maran
Alberto Maria Metelli
Marco Mussi
Marcello Restelli
Publication date: 12 December 2022
Publisher

Abstract

Autoregressive processes naturally arise in a large variety of real-world scenarios, including e.g., stock markets, sell forecasting, weather prediction, advertising, and pricing. When addressing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for converge to the optimal decision policy. In this work, we propose a novel online learning setting, named Autoregressive Bandits (ARBs), in which the observed reward follows an autoregressive process of order

k

, whose parameters depend on the action the agent chooses, within a finite set of

n

actions. Then, we devise an optimistic regret minimization algorithm AutoRegressive Upper Confidence Bounds (AR-UCB) that suffers regret of order

\widetilde{\mathcal{O}} \left( \frac{(k+1)^{3/2}\sqrt{nT}}{(1-\Gamma)^2} \right)

, being

T

the optimization horizon and

\Gamma < 1

an index of the stability of the system. Finally, we present a numerical validation in several synthetic and one real-world setting, in comparison with general and specific purpose bandit baselines showing the advantages of the proposed approach

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2212.06251

Last time updated on 08/01/2023