427 research outputs found
A Definition of Non-Stationary Bandits
Despite the subject of non-stationary bandit learning having attracted much
recent attention, we have yet to identify a formal definition of
non-stationarity that can consistently distinguish non-stationary bandits from
stationary ones. Prior work has characterized non-stationary bandits as bandits
for which the reward distribution changes over time. We demonstrate that this
definition can ambiguously classify the same bandit as both stationary and
non-stationary; this ambiguity arises in the existing definition's dependence
on the latent sequence of reward distributions. Moreover, the definition has
given rise to two widely used notions of regret: the dynamic regret and the
weak regret. These notions are not indicative of qualitative agent performance
in some bandits. Additionally, this definition of non-stationary bandits has
led to the design of agents that explore excessively. We introduce a formal
definition of non-stationary bandits that resolves these issues. Our new
definition provides a unified approach, applicable seamlessly to both Bayesian
and frequentist formulations of bandits. Furthermore, our definition ensures
consistent classification of two bandits offering agents indistinguishable
experiences, categorizing them as either both stationary or both
non-stationary. This advancement provides a more robust framework for
non-stationary bandit learning
- …