We tackle a new emerging problem, which is finding an optimal monopartite
matching in a weighted graph. The semi-bandit version, where a full matching is
sampled at each iteration, has been addressed by \cite{ADMA}, creating an
algorithm with an expected regret matching $O(\frac{L\log(L)}{\Delta}\log(T))$
with $2L$ players, $T$ iterations and a minimum reward gap $\Delta$. We reduce
this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the
unimodality property of the expected reward on the appropriate graph to design
an algorithm with a regret in $O(L\frac{1}{\Delta}\log(T))$. Secondly, we show
that by moving the focus towards the main question `\emph{Is user $i$ better
than user $j$?}' this regret becomes
$O(L\frac{\Delta}{\tilde{\Delta}^2}\log(T))$, where $\Tilde{\Delta} > \Delta$
derives from a better way of comparing users. Some experimental results finally
show these theoretical results are corroborated in practice

Fromont, Elisa

Gaudel, Romaric

Gauthier, Camille-Sovanneary

English

arXiv

International audienceWe tackle, in the multiple-play bandit setting, the online ranking problem of assigning $L$ items to $K$ predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the unimodality-like property of the bandit setting with respect toa graph where nodes are ordered sets of indistinguishable items.The main contribution of UniRank is its $O\left(L/\Delta \log T\right)$ regret for $T$ consecutive assignments, where $\Delta$ relates to the reward-gap between two items.This regret bound is based on the usually implicit condition that two items may not have the same attractiveness.Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets

UniRank: Unimodal Bandit Algorithm for Online Ranking

Abstract

Similar works

Full text

Available Versions

HAL Descartes

HAL-Rennes 1

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-CentraleSupelec