3 research outputs found
Sparsity, variance and curvature in multi-armed bandits
In (online) learning theory the concepts of sparsity, variance and curvature
are well-understood and are routinely used to obtain refined regret and
generalization bounds. In this paper we further our understanding of these
concepts in the more challenging limited feedback scenario. We consider the
adversarial multi-armed bandit and linear bandit settings and solve several
open problems pertaining to the existence of algorithms with favorable regret
bounds under the following assumptions: (i) sparsity of the individual losses,
(ii) small variation of the loss sequence, and (iii) curvature of the action
set. Specifically we show that (i) for -sparse losses one can obtain
-regret (solving an open problem by Kwon and Perchet),
(ii) for loss sequences with variation bounded by one can obtain
-regret (solving an open problem by Kale and Hazan), and
(iii) for linear bandit on an ball one can obtain -regret for and one has -regret
for (solving an open problem by Bubeck, Cesa-Bianchi and Kakade). A key
new insight to obtain these results is to use regularizers satisfying more
refined conditions than general self-concordanceComment: 18 page
On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits
We make three contributions to the theory of k-armed adversarial bandits.
First, we prove a first-order bound for a modified variant of the INF strategy
by Audibert and Bubeck [2009], without sacrificing worst case optimality or
modifying the loss estimators. Second, we provide a variance analysis for
algorithms based on follow the regularised leader, showing that without
adaptation the variance of the regret is typically {\Omega}(n^2) where n is the
horizon. Finally, we study bounds that depend on the degree of separation of
the arms, generalising the results by Cowan and Katehakis [2015] from the
stochastic setting to the adversarial and improving the result of Seldin and
Slivkins [2014] by a factor of log(n)/log(log(n)).Comment: 14 page
More Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multi-armed
bandit problem (or more generally the combinatorial semi-bandit problem). When
instantiated differently, our algorithm achieves various new data-dependent
regret bounds improving previous work. Examples include: 1) a regret bound
depending on the variance of only the best arm; 2) a regret bound depending on
the first-order path-length of only the best arm; 3) a regret bound depending
on the sum of first-order path-lengths of all arms as well as an important
negative term, which together lead to faster convergence rates for some normal
form games with partial feedback; 4) a regret bound that simultaneously implies
small regret when the best arm has small loss and logarithmic regret when there
exists an arm whose expected loss is always smaller than those of others by a
fixed gap (e.g. the classic i.i.d. setting). In some cases, such as the last
two results, our algorithm is completely parameter-free.
The main idea of our algorithm is to apply the optimism and adaptivity
techniques to the well-known Online Mirror Descent framework with a special
log-barrier regularizer. The challenges are to come up with appropriate
optimistic predictions and correction terms in this framework. Some of our
results also crucially rely on using a sophisticated increasing learning rate
schedule