Search CORE

9 research outputs found

Context-lumpable stochastic bandits

Author: Abbasi-Yadkori Yasin
Jin Chi
Lattimore Tor
Lee Chung-Wei
Liu Qinghua
Szepesvári Csaba
Publication venue
Publication date: 22/06/2023
Field of study

We consider a contextual bandit problem with

S

contexts and

A

actions. In each round

t=1,2,\dots

the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into

r\le \min\{S ,A \}

groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an

\epsilon

-optimal policy after using at most

\widetilde O(r (S +A )/\epsilon^2)

samples with high probability and provide a matching

\widetilde\Omega(r (S +A )/\epsilon^2)

lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time

T

is bounded by

\widetilde O(\sqrt{r^3(S +A )T})

. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and

\widetilde O(\sqrt{{poly}(r)(S+K)T})

minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios

arXiv.org e-Print Archive

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Author: Hsieh Cho-Jui
Kang Yue
Lee Thomas C. M.
Publication venue
Publication date: 14/01/2024
Field of study

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown

d_1

d_2

matrix

\Theta^*

with rank

r \ll \{d_1, d_2\}

, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the

\tilde{O}(\sqrt{(d_1+d_2)MrT})

bound of regret while G-ESTS can achineve the

\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})

bound of regret under mild assumption up to logarithm terms, where

M

is some problem dependent value. Under a reasonable assumption that

M = O((d_1+d_2)^2)

in our problem setting, the regret of G-ESTT is consistent with the current best regret of

\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})

~\citep{lu2021low} (

D_{rr}

will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.Comment: Revision of the paper accepted by NeurIPS 202

arXiv.org e-Print Archive

A Simple Unified Framework for High Dimensional Bandit Problems

Author: Barik Adarsh
Honorio Jean
Li Wenjie
Publication venue
Publication date: 14/06/2021
Field of study

Stochastic high dimensional bandit problems with low dimensional structures are useful in different applications such as online advertising and drug discovery. In this work, we propose a simple unified algorithm for such problems and present a general analysis framework for the regret upper bound of our algorithm. We show that under some mild unified assumptions, our algorithm can be applied to different high dimensional bandit problems. Our framework utilizes the low dimensional structure to guide the parameter estimation in the problem, therefore our algorithm achieves the best regret bounds in the LASSO bandit, as well as novel bounds in the low-rank matrix bandit, the group sparse matrix bandit, and in a new problem: the multi-agent LASSO bandit

arXiv.org e-Print Archive

Recommended from our members

Explorations in Stochastic Bandit Problems: Theory and Applications

Author: Kang Yue
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

The stochastic bandit problem~\citep{robbins1952some} is a type of decision-making problem where an agent must repeatedly choose between multiple arms from a (varying) arm set, where each arm is associated with an unknown and different reward distribution, and the objective is to maximize the cumulative reward over time. This problem gets its name from the analogy of a gambler choosing which arm of a row of slot machines to pull, where each machine provides a different and unknown probability of winning. This problem framework is widely applicable in various areas and several sub-problems of it have been extensively studied during the past few years, e.g. multi-armed bandits (MAB)~\citep{robbins1952some}, linear bandits~\citep{abbasi2011improved}, Lipschitz bandits~\citep{agrawal1995continuum} and so on. However, existing research on bandits faces certain limitations, both theoretical and crucially in practical applications. These challenges have become significant bottlenecks in advancing the field of stochastic bandit problems. To name a few, (1) robustness against adversarial attacks (Chapter 2); (2) auto-hyperparameter tuning (Chapter 3); (3) adaptivity to non-stationary environment (Chapter 3); (4) efficiency under high-dimensional structure with sparsity (Chapter 4); (5) resilience to heavy-tailed payoffs (Chapter 5). Given that these fundamental issues have rarely been explored in the past, we have committed significant effort to addressing and resolving these challenges both theoretically and practically. In Chapter 1, we present a brief introduction to the bandit problem along with some limitations on the existing literature, which motivates our research. In Chapter 2, we introduce the stochastic Lipschitz bandit problem under the presence of adversarial attacks, and we propose a line of novel algorithms under different types of adversaries even agnostic to the total corruption level

C

. Subsequently, we study how to dynamically tune the hyperparameters in bandit algorithms with an infinite number of hyperparameter value candidates in Chapter 3. In Chapte 4, we investigate the recently popular low-rank matrix bandit problem and propose two types of algorithms with improved empirical performance and decent regret bounds. Then in Chapter 5, we revisit the low-rank matrix bandit problem but under a more sophisticated scenario: the stochastic payoffs are infused with heavy-tailed noise, and propose a novel framework to handle the heavy-tailedness and sparsity simultaneously. All the algorithms and frameworks we propose are backed by robust theoretical guarantees, with proofs provided in the Appendix

eScholarship - University of California