890 research outputs found
Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits
Motivated by applications in energy management, this paper presents the
Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the
exploration of risky arms, MARAB takes as arm quality its conditional value at
risk. When the user-supplied risk level goes to 0, the arm quality tends toward
the essential infimum of the arm distribution density, and MARAB tends toward
the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal
value. As a first contribution, this paper presents a theoretical analysis of
the MIN algorithm under mild assumptions, establishing its robustness
comparatively to UCB. The analysis is supported by extensive experimental
validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB
algorithms on artificial and real-world problems.Comment: 16 page
Lifelong Bandit Optimization: No Prior and No Regret
In practical applications, machine learning algorithms are often repeatedly
applied to problems with similar structure over and over again. We focus on
solving a sequence of bandit optimization tasks and develop LiBO, an algorithm
which adapts to the environment by learning from past experience and becoming
more sample-efficient in the process. We assume a kernelized structure where
the kernel is unknown but shared across all tasks. LiBO sequentially
meta-learns a kernel that approximates the true kernel and simultaneously
solves the incoming tasks with the latest kernel estimate. Our algorithm can be
paired with any kernelized bandit algorithm and guarantees oracle optimal
performance, meaning that as more tasks are solved, the regret of LiBO on each
task converges to the regret of the bandit algorithm with oracle knowledge of
the true kernel. Naturally, if paired with a sublinear bandit algorithm, LiBO
yields a sublinear lifelong regret. We also show that direct access to the data
from each task is not necessary for attaining sublinear regret. The lifelong
problem can thus be solved in a federated manner, while keeping the data of
each task private.Comment: 32 pages, 6 figures, preprin
- …