9 research outputs found
Multi-Armed Bandits with Abstention
We introduce a novel extension of the canonical multi-armed bandit problem
that incorporates an additional strategic element: abstention. In this enhanced
framework, the agent is not only tasked with selecting an arm at each time
step, but also has the option to abstain from accepting the stochastic
instantaneous reward before observing it. When opting for abstention, the agent
either suffers a fixed regret or gains a guaranteed reward. Given this added
layer of complexity, we ask whether we can develop efficient algorithms that
are both asymptotically and minimax optimal. We answer this question
affirmatively by designing and analyzing algorithms whose regrets meet their
corresponding information-theoretic lower bounds. Our results offer valuable
quantitative insights into the benefits of the abstention option, laying the
groundwork for further exploration in other online decision-making problems
with such an option. Numerical results further corroborate our theoretical
findings.Comment: Preprin