Search CORE

1,293 research outputs found

Overlapping Multi-Bandit Best Arm Identification

Author: Bogunovic Ilija
Cevher Volkan
Scarlett Jonathan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2019
Field of study

In the multi-armed bandit literature, the multi-bandit best-arm identification problem consists of determining each best arm in a number of disjoint groups of arms, with as few total arm pulls as possible. In this paper, we introduce a variant of the multi-bandit problem with overlapping groups, and present two algorithms for this problem based on successive elimination and lower/upper confidence bounds (LUCB). We bound the number of total arm pulls required for high-probability best-arm identification in every group, and we complement these bounds with a near-matching algorithm-independent lower bound. In addition, we show that a specific choice of the groups recovers the top-k ranking problem

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Pure Exploration with Multiple Correct Answers

Author: Degenne Rémy
Koolen Wouter M.
Publication venue
Publication date: 01/01/2019
Field of study

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound

arXiv.org e-Print Archive

CWI's Institutional Repository

Local Clustering in Contextual Multi-Armed Bandits

Author: Ban Yikun
He Jingrui
Publication venue
Publication date: 26/02/2021
Field of study

We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user's actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.Comment: 12 page

arXiv.org e-Print Archive

On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence

Author: Azize Achraf
Basu Debabrota
Jourdan Marc
Marjani Aymen Al
Publication venue
Publication date: 05/09/2023
Field of study

Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies to name a few. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence under

\epsilon

-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive a lower bound on the sample complexity of any

\delta

-correct BAI algorithm satisfying

\epsilon

-global DP. Our lower bound suggests the existence of two privacy regimes depending on the privacy budget

\epsilon

. In the high-privacy regime (small

\epsilon

), the hardness depends on a coupled effect of privacy and a novel information-theoretic quantity, called the Total Variation Characteristic Time. In the low-privacy regime (large

\epsilon

), the sample complexity lower bound reduces to the classical non-private lower bound. Second, we propose AdaP-TT, an

\epsilon

-global DP variant of the Top Two algorithm. AdaP-TT runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. We derive an asymptotic upper bound on the sample complexity of AdaP-TT that matches with the lower bound up to multiplicative constants in the high-privacy regime. Finally, we provide an experimental analysis of AdaP-TT that validates our theoretical results

arXiv.org e-Print Archive

Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments

Author: Geng Tong
Lin Xiliang
Nair Harikesh S.
Publication venue
Publication date: 04/09/2019
Field of study

Firms implementing digital advertising campaigns face a complex problem in determining the right match between their advertising creatives and target audiences. Typical solutions to the problem have leveraged non-experimental methods, or used "split-testing" strategies that have not explicitly addressed the complexities induced by targeted audiences that can potentially overlap with one another. This paper presents an adaptive algorithm that addresses the problem via online experimentation. The algorithm is set up as a contextual bandit and addresses the overlap issue by partitioning the target audiences into disjoint, non-overlapping sub-populations. It learns an optimal creative display policy in the disjoint space, while assessing in parallel which creative has the best match in the space of possibly overlapping target audiences. Experiments show that the proposed method is more efficient compared to naive "split-testing" or non-adaptive "A/B/n" testing based methods. We also describe a testing product we built that uses the algorithm. The product is currently deployed on the advertising platform of JD.com, an eCommerce company and a publisher of digital ads in China

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications