6 research outputs found
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lower bound for this problem. We close this gap by first proving that rank-one bandits are a particular instance of unimodal bandits, and then providing a new analysis of Unimodal Thompson Sampling (UTS), initially proposed by Paladino et al (2017). We prove an asymptotically optimal regret bound on the frequentist regret of UTS and we support our claims with simulations showing the significant improvement of our method compared to the state-of-the-art
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lower bound for this problem. We close this gap by first proving that rank-one bandits are a particular instance of unimodal bandits, and then providing a new analysis of Unimodal Thompson Sampling (UTS), initially proposed by Paladino et al (2017). We prove an asymptotically optimal regret bound on the frequentist regret of UTS and we support our claims with simulations showing the significant improvement of our method compared to the state-of-the-art
A Simple Unified Framework for High Dimensional Bandit Problems
Stochastic high dimensional bandit problems with low dimensional structures
are useful in different applications such as online advertising and drug
discovery. In this work, we propose a simple unified algorithm for such
problems and present a general analysis framework for the regret upper bound of
our algorithm. We show that under some mild unified assumptions, our algorithm
can be applied to different high dimensional bandit problems. Our framework
utilizes the low dimensional structure to guide the parameter estimation in the
problem, therefore our algorithm achieves the best regret bounds in the LASSO
bandit, as well as novel bounds in the low-rank matrix bandit, the group sparse
matrix bandit, and in a new problem: the multi-agent LASSO bandit
Recommended from our members
Explorations in Stochastic Bandit Problems: Theory and Applications
The stochastic bandit problem~\citep{robbins1952some} is a type of decision-making problem where an agent must repeatedly choose between multiple arms from a (varying) arm set, where each arm is associated with an unknown and different reward distribution, and the objective is to maximize the cumulative reward over time. This problem gets its name from the analogy of a gambler choosing which arm of a row of slot machines to pull, where each machine provides a different and unknown probability of winning. This problem framework is widely applicable in various areas and several sub-problems of it have been extensively studied during the past few years, e.g. multi-armed bandits (MAB)~\citep{robbins1952some}, linear bandits~\citep{abbasi2011improved}, Lipschitz bandits~\citep{agrawal1995continuum} and so on. However, existing research on bandits faces certain limitations, both theoretical and crucially in practical applications. These challenges have become significant bottlenecks in advancing the field of stochastic bandit problems. To name a few, (1) robustness against adversarial attacks (Chapter 2); (2) auto-hyperparameter tuning (Chapter 3); (3) adaptivity to non-stationary environment (Chapter 3); (4) efficiency under high-dimensional structure with sparsity (Chapter 4); (5) resilience to heavy-tailed payoffs (Chapter 5). Given that these fundamental issues have rarely been explored in the past, we have committed significant effort to addressing and resolving these challenges both theoretically and practically. In Chapter 1, we present a brief introduction to the bandit problem along with some limitations on the existing literature, which motivates our research. In Chapter 2, we introduce the stochastic Lipschitz bandit problem under the presence of adversarial attacks, and we propose a line of novel algorithms under different types of adversaries even agnostic to the total corruption level . Subsequently, we study how to dynamically tune the hyperparameters in bandit algorithms with an infinite number of hyperparameter value candidates in Chapter 3. In Chapte 4, we investigate the recently popular low-rank matrix bandit problem and propose two types of algorithms with improved empirical performance and decent regret bounds. Then in Chapter 5, we revisit the low-rank matrix bandit problem but under a more sophisticated scenario: the stochastic payoffs are infused with heavy-tailed noise, and propose a novel framework to handle the heavy-tailedness and sparsity simultaneously. All the algorithms and frameworks we propose are backed by robust theoretical guarantees, with proofs provided in the Appendix