Copeland Dueling Bandits

de Rijke, Maarten; Karnin, Zohar; Whiteson, Shimon; Zoghi, Masrour

research

Copeland Dueling Bandits

Authors: Maarten de Rijke
Zohar Karnin
Shimon Whiteson
Masrour Zoghi
Publication date: 1 January 2015
Publisher

Abstract

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form