Search CORE

3 research outputs found

Sparsity, variance and curvature in multi-armed bandits

Author: Bubeck Sébastien
Cohen Michael B.
Li Yuanzhi
Publication venue
Publication date: 03/11/2017
Field of study

In (online) learning theory the concepts of sparsity, variance and curvature are well-understood and are routinely used to obtain refined regret and generalization bounds. In this paper we further our understanding of these concepts in the more challenging limited feedback scenario. We consider the adversarial multi-armed bandit and linear bandit settings and solve several open problems pertaining to the existence of algorithms with favorable regret bounds under the following assumptions: (i) sparsity of the individual losses, (ii) small variation of the loss sequence, and (iii) curvature of the action set. Specifically we show that (i) for

s

-sparse losses one can obtain

\tilde{O}(\sqrt{s T})

-regret (solving an open problem by Kwon and Perchet), (ii) for loss sequences with variation bounded by

Q

one can obtain

\tilde{O}(\sqrt{Q})

-regret (solving an open problem by Kale and Hazan), and (iii) for linear bandit on an

\ell_p^n

ball one can obtain

\tilde{O}(\sqrt{n T})

-regret for

p \in [1,2]

and one has

\tilde{\Omega}(n \sqrt{T})

-regret for

p>2

(solving an open problem by Bubeck, Cesa-Bianchi and Kakade). A key new insight to obtain these results is to use regularizers satisfying more refined conditions than general self-concordanceComment: 18 page

arXiv.org e-Print Archive

On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

Author: Lattimore Tor
Pogodin Roman
Publication venue
Publication date: 24/07/2019
Field of study

We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).Comment: 14 page

arXiv.org e-Print Archive

More Adaptive Algorithms for Adversarial Bandits

Author: Luo Haipeng
Wei Chen-Yu
Publication venue
Publication date: 07/06/2018
Field of study

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret bounds improving previous work. Examples include: 1) a regret bound depending on the variance of only the best arm; 2) a regret bound depending on the first-order path-length of only the best arm; 3) a regret bound depending on the sum of first-order path-lengths of all arms as well as an important negative term, which together lead to faster convergence rates for some normal form games with partial feedback; 4) a regret bound that simultaneously implies small regret when the best arm has small loss and logarithmic regret when there exists an arm whose expected loss is always smaller than those of others by a fixed gap (e.g. the classic i.i.d. setting). In some cases, such as the last two results, our algorithm is completely parameter-free. The main idea of our algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer. The challenges are to come up with appropriate optimistic predictions and correction terms in this framework. Some of our results also crucially rely on using a sophisticated increasing learning rate schedule

arXiv.org e-Print Archive