48 research outputs found
Approximation Algorithms for Correlated Knapsacks and Non-Martingale Bandits
In the stochastic knapsack problem, we are given a knapsack of size B, and a
set of jobs whose sizes and rewards are drawn from a known probability
distribution. However, we know the actual size and reward only when the job
completes. How should we schedule jobs to maximize the expected total reward?
We know O(1)-approximations when we assume that (i) rewards and sizes are
independent random variables, and (ii) we cannot prematurely cancel jobs. What
can we say when either or both of these assumptions are changed?
The stochastic knapsack problem is of interest in its own right, but
techniques developed for it are applicable to other stochastic packing
problems. Indeed, ideas for this problem have been useful for budgeted learning
problems, where one is given several arms which evolve in a specified
stochastic fashion with each pull, and the goal is to pull the arms a total of
B times to maximize the reward obtained. Much recent work on this problem focus
on the case when the evolution of the arms follows a martingale, i.e., when the
expected reward from the future is the same as the reward at the current state.
What can we say when the rewards do not form a martingale?
In this paper, we give constant-factor approximation algorithms for the
stochastic knapsack problem with correlations and/or cancellations, and also
for budgeted learning problems where the martingale condition is not satisfied.
Indeed, we can show that previously proposed LP relaxations have large
integrality gaps. We propose new time-indexed LP relaxations, and convert the
fractional solutions into distributions over strategies, and then use the LP
values and the time ordering information from these strategies to devise a
randomized adaptive scheduling algorithm. We hope our LP formulation and
decomposition methods may provide a new way to address other correlated bandit
problems with more general contexts
Dynamic Ad Allocation: Bandits with Budgets
We consider an application of multi-armed bandits to internet advertising
(specifically, to dynamic ad allocation in the pay-per-click model, with
uncertainty on the click probabilities). We focus on an important practical
issue that advertisers are constrained in how much money they can spend on
their ad campaigns. This issue has not been considered in the prior work on
bandit-based approaches for ad allocation, to the best of our knowledge.
We define a simple, stylized model where an algorithm picks one ad to display
in each round, and each ad has a \emph{budget}: the maximal amount of money
that can be spent on this ad. This model admits a natural variant of UCB1, a
well-known algorithm for multi-armed bandits with stochastic rewards. We derive
strong provable guarantees for this algorithm
Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback
We consider the linear contextual multi-class multi-period packing problem
(LMMP) where the goal is to pack items such that the total vector of
consumption is below a given budget vector and the total value is as large as
possible. We consider the setting where the reward and the consumption vector
associated with each action is a class-dependent linear function of the
context, and the decision-maker receives bandit feedback. LMMP includes linear
contextual bandits with knapsacks and online revenue management as special
cases. We establish a new estimator which guarantees a faster convergence rate,
and consequently, a lower regret in such problems. We propose a bandit policy
that is a closed-form function of said estimated parameters. When the contexts
are non-degenerate, the regret of the proposed policy is sublinear in the
context dimension, the number of classes, and the time horizon when the
budget grows at least as . We also resolve an open problem posed by
Agrawal & Devanur (2016) and extend the result to a multi-class setting. Our
numerical experiments clearly demonstrate that the performance of our policy is
superior to other benchmarks in the literature.Comment: Accepted in ICML 2023, 44 pages including Appendi