Search CORE

2,156 research outputs found

Factored Bandits

Author: Seldin Yevgeny
Zimmert Julian
Publication venue
Publication date: 01/01/2018
Field of study

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms)

arXiv.org e-Print Archive

Copenhagen University Research Information System

Calibrated Fairness in Bandits

Author: Dimitrakakis Christos
Liu Yang
Mandal Debmalya
Parkes David C.
Radanovic Goran
Publication venue
Publication date: 06/07/2017
Field of study

We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distribution then the probability of selecting each arm should be similar. In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization. We show that a variation on Thompson sampling satisfies smooth fairness for total variation distance, and give an

\tilde{O}((kT)^{2/3})

bound on fairness regret. This complements prior work, which protects an on-average better arm from being less favored. We also explain how to extend our algorithm to the dueling bandit setting.Comment: To be presented at the FAT-ML'17 worksho

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Author: Clérot Fabrice
Gajane Pratik
Urvoy Tanguy
Publication venue
Publication date: 01/01/2015
Field of study

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot