13 research outputs found
A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem
The multi-armed bandit problem has been extensively studied under the
stationary assumption. However in reality, this assumption often does not hold
because the distributions of rewards themselves may change over time. In this
paper, we propose a change-detection (CD) based framework for multi-armed
bandit problems under the piecewise-stationary setting, and study a class of
change-detection based UCB (Upper Confidence Bound) policies, CD-UCB, that
actively detects change points and restarts the UCB indices. We then develop
CUSUM-UCB and PHT-UCB, that belong to the CD-UCB class and use cumulative sum
(CUSUM) and Page-Hinkley Test (PHT) to detect changes. We show that CUSUM-UCB
obtains the best known regret upper bound under mild assumptions. We also
demonstrate the regret reduction of the CD-UCB policies over arbitrary
Bernoulli rewards and Yahoo! datasets of webpage click-through rates.Comment: accepted by AAAI 201
Dynamic Ensemble Active Learning: A Non-Stationary Bandit with Expert Advice
Active learning aims to reduce annotation cost by predicting which samples
are useful for a human teacher to label. However it has become clear there is
no best active learning algorithm. Inspired by various philosophies about what
constitutes a good criteria, different algorithms perform well on different
datasets. This has motivated research into ensembles of active learners that
learn what constitutes a good criteria in a given scenario, typically via
multi-armed bandit algorithms. Though algorithm ensembles can lead to better
results, they overlook the fact that not only does algorithm efficacy vary
across datasets, but also during a single active learning session. That is, the
best criteria is non-stationary. This breaks existing algorithms' guarantees
and hampers their performance in practice. In this paper, we propose dynamic
ensemble active learning as a more general and promising research direction. We
develop a dynamic ensemble active learner based on a non-stationary multi-armed
bandit with expert advice algorithm. Our dynamic ensemble selects the right
criteria at each step of active learning. It has theoretical guarantees, and
shows encouraging results on popular datasets.Comment: This work has been accepted at ICPR2018 and won Piero Zamperoni Best
Student Paper Awar