9 research outputs found
Context-lumpable stochastic bandits
We consider a contextual bandit problem with contexts and actions.
In each round the learner observes a random context and chooses
an action based on its past experience. The learner then observes a random
reward whose mean is a function of the context and the action for the round.
Under the assumption that the contexts can be lumped into
groups such that the mean reward for the various actions is the same for any
two contexts that are in the same group, we give an algorithm that outputs an
-optimal policy after using at most samples with high probability and provide a matching
lower bound. In the regret
minimization setting, we give an algorithm whose cumulative regret up to time
is bounded by . To the best of our
knowledge, we are the first to show the near-optimal sample complexity in the
PAC setting and minimax regret in the
online setting for this problem. We also show our algorithms can be applied to
more general low-rank bandits and get improved regret bounds in some scenarios
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems
In the stochastic contextual low-rank matrix bandit problem, the expected
reward of an action is given by the inner product between the action's feature
matrix and some fixed, but initially unknown by matrix
with rank , and an agent sequentially takes actions based
on past experience to maximize the cumulative reward. In this paper, we study
the generalized low-rank matrix bandit problem, which has been recently
proposed in \cite{lu2021low} under the Generalized Linear Model (GLM)
framework. To overcome the computational infeasibility and theoretical restrain
of existing algorithms on this problem, we first propose the G-ESTT framework
that modifies the idea from \cite{jun2019bilinear} by using Stein's method on
the subspace estimation and then leverage the estimated subspaces via a
regularization idea. Furthermore, we remarkably improve the efficiency of
G-ESTT by using a novel exclusion idea on the estimated subspace instead, and
propose the G-ESTS framework. We also show that G-ESTT can achieve the
bound of regret while G-ESTS can achineve the
bound of regret under mild
assumption up to logarithm terms, where is some problem dependent value.
Under a reasonable assumption that in our problem setting,
the regret of G-ESTT is consistent with the current best regret of
~\citep{lu2021low} ( will
be defined later). For completeness, we conduct experiments to illustrate that
our proposed algorithms, especially G-ESTS, are also computationally tractable
and consistently outperform other state-of-the-art (generalized) linear matrix
bandit methods based on a suite of simulations.Comment: Revision of the paper accepted by NeurIPS 202
A Simple Unified Framework for High Dimensional Bandit Problems
Stochastic high dimensional bandit problems with low dimensional structures
are useful in different applications such as online advertising and drug
discovery. In this work, we propose a simple unified algorithm for such
problems and present a general analysis framework for the regret upper bound of
our algorithm. We show that under some mild unified assumptions, our algorithm
can be applied to different high dimensional bandit problems. Our framework
utilizes the low dimensional structure to guide the parameter estimation in the
problem, therefore our algorithm achieves the best regret bounds in the LASSO
bandit, as well as novel bounds in the low-rank matrix bandit, the group sparse
matrix bandit, and in a new problem: the multi-agent LASSO bandit
Recommended from our members
Explorations in Stochastic Bandit Problems: Theory and Applications
The stochastic bandit problem~\citep{robbins1952some} is a type of decision-making problem where an agent must repeatedly choose between multiple arms from a (varying) arm set, where each arm is associated with an unknown and different reward distribution, and the objective is to maximize the cumulative reward over time. This problem gets its name from the analogy of a gambler choosing which arm of a row of slot machines to pull, where each machine provides a different and unknown probability of winning. This problem framework is widely applicable in various areas and several sub-problems of it have been extensively studied during the past few years, e.g. multi-armed bandits (MAB)~\citep{robbins1952some}, linear bandits~\citep{abbasi2011improved}, Lipschitz bandits~\citep{agrawal1995continuum} and so on. However, existing research on bandits faces certain limitations, both theoretical and crucially in practical applications. These challenges have become significant bottlenecks in advancing the field of stochastic bandit problems. To name a few, (1) robustness against adversarial attacks (Chapter 2); (2) auto-hyperparameter tuning (Chapter 3); (3) adaptivity to non-stationary environment (Chapter 3); (4) efficiency under high-dimensional structure with sparsity (Chapter 4); (5) resilience to heavy-tailed payoffs (Chapter 5). Given that these fundamental issues have rarely been explored in the past, we have committed significant effort to addressing and resolving these challenges both theoretically and practically. In Chapter 1, we present a brief introduction to the bandit problem along with some limitations on the existing literature, which motivates our research. In Chapter 2, we introduce the stochastic Lipschitz bandit problem under the presence of adversarial attacks, and we propose a line of novel algorithms under different types of adversaries even agnostic to the total corruption level . Subsequently, we study how to dynamically tune the hyperparameters in bandit algorithms with an infinite number of hyperparameter value candidates in Chapter 3. In Chapte 4, we investigate the recently popular low-rank matrix bandit problem and propose two types of algorithms with improved empirical performance and decent regret bounds. Then in Chapter 5, we revisit the low-rank matrix bandit problem but under a more sophisticated scenario: the stochastic payoffs are infused with heavy-tailed noise, and propose a novel framework to handle the heavy-tailedness and sparsity simultaneously. All the algorithms and frameworks we propose are backed by robust theoretical guarantees, with proofs provided in the Appendix