1 research outputs found
Combinatorial Pure Exploration of Dueling Bandit
In this paper, we study combinatorial pure exploration for dueling bandits
(CPE-DB): we have multiple candidates for multiple positions as modeled by a
bipartite graph, and in each round we sample a duel of two candidates on one
position and observe who wins in the duel, with the goal of finding the best
candidate-position matching with high probability after multiple rounds of
samples. CPE-DB is an adaptation of the original combinatorial pure exploration
for multi-armed bandit (CPE-MAB) problem to the dueling bandit setting.
We consider both the Borda winner and the Condorcet winner cases. For Borda
winner, we establish a reduction of the problem to the original CPE-MAB setting
and design PAC and exact algorithms that achieve both the sample complexity
similar to that in the CPE-MAB setting (which is nearly optimal for a subclass
of problems) and polynomial running time per round.
For Condorcet winner, we first design a fully polynomial time approximation
scheme (FPTAS) for the offline problem of finding the Condorcet winner with
known winning probabilities, and then use the FPTAS as an oracle to design a
novel pure exploration algorithm - with sample
complexity analysis. - is the first algorithm with
polynomial running time per round for identifying the Condorcet winner in
CPE-DB.Comment: Accepted to ICML 202