1,293 research outputs found
Overlapping Multi-Bandit Best Arm Identification
In the multi-armed bandit literature, the multi-bandit best-arm identification problem consists of determining each best arm in a number of disjoint groups of arms, with as few total arm pulls as possible. In this paper, we introduce a variant of the multi-bandit problem with overlapping groups, and present two algorithms for this problem based on successive elimination and lower/upper confidence bounds (LUCB). We bound the number of total arm pulls required for high-probability best-arm identification in every group, and we complement these bounds with a near-matching algorithm-independent lower bound. In addition, we show that a specific choice of the groups recovers the top-k ranking problem
Pure Exploration with Multiple Correct Answers
We determine the sample complexity of pure exploration bandit problems with
multiple good answers. We derive a lower bound using a new game equilibrium
argument. We show how continuity and convexity properties of single-answer
problems ensures that the Track-and-Stop algorithm has asymptotically optimal
sample complexity. However, that convexity is lost when going to the
multiple-answer setting. We present a new algorithm which extends
Track-and-Stop to the multiple-answer case and has asymptotic sample complexity
matching the lower bound
Local Clustering in Contextual Multi-Armed Bandits
We study identifying user clusters in contextual multi-armed bandits (MAB).
Contextual MAB is an effective tool for many real applications, such as content
recommendation and online advertisement. In practice, user dependency plays an
essential role in the user's actions, and thus the rewards. Clustering similar
users can improve the quality of reward estimation, which in turn leads to more
effective content recommendation and targeted advertising. Different from
traditional clustering settings, we cluster users based on the unknown bandit
parameters, which will be estimated incrementally. In particular, we define the
problem of cluster detection in contextual MAB, and propose a bandit algorithm,
LOCB, embedded with local clustering procedure. And, we provide theoretical
analysis about LOCB in terms of the correctness and efficiency of clustering
and its regret bound. Finally, we evaluate the proposed algorithm from various
aspects, which outperforms state-of-the-art baselines.Comment: 12 page
On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence
Best Arm Identification (BAI) problems are progressively used for
data-sensitive applications, such as designing adaptive clinical trials, tuning
hyper-parameters, and conducting user studies to name a few. Motivated by the
data privacy concerns invoked by these applications, we study the problem of
BAI with fixed confidence under -global Differential Privacy (DP).
First, to quantify the cost of privacy, we derive a lower bound on the sample
complexity of any -correct BAI algorithm satisfying -global
DP. Our lower bound suggests the existence of two privacy regimes depending on
the privacy budget . In the high-privacy regime (small ),
the hardness depends on a coupled effect of privacy and a novel
information-theoretic quantity, called the Total Variation Characteristic Time.
In the low-privacy regime (large ), the sample complexity lower bound
reduces to the classical non-private lower bound. Second, we propose AdaP-TT,
an -global DP variant of the Top Two algorithm. AdaP-TT runs in
arm-dependent adaptive episodes and adds Laplace noise to ensure a good
privacy-utility trade-off. We derive an asymptotic upper bound on the sample
complexity of AdaP-TT that matches with the lower bound up to multiplicative
constants in the high-privacy regime. Finally, we provide an experimental
analysis of AdaP-TT that validates our theoretical results
Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments
Firms implementing digital advertising campaigns face a complex problem in
determining the right match between their advertising creatives and target
audiences. Typical solutions to the problem have leveraged non-experimental
methods, or used "split-testing" strategies that have not explicitly addressed
the complexities induced by targeted audiences that can potentially overlap
with one another. This paper presents an adaptive algorithm that addresses the
problem via online experimentation. The algorithm is set up as a contextual
bandit and addresses the overlap issue by partitioning the target audiences
into disjoint, non-overlapping sub-populations. It learns an optimal creative
display policy in the disjoint space, while assessing in parallel which
creative has the best match in the space of possibly overlapping target
audiences. Experiments show that the proposed method is more efficient compared
to naive "split-testing" or non-adaptive "A/B/n" testing based methods. We also
describe a testing product we built that uses the algorithm. The product is
currently deployed on the advertising platform of JD.com, an eCommerce company
and a publisher of digital ads in China
- …