Given a set V of n objects, an online ranking system outputs at each time
step a full ranking of the set, observes a feedback of some form and suffers a
loss. We study the setting in which the (adversarial) feedback is an element in
V, and the loss is the position (0th, 1st, 2nd...) of the item in the
outputted ranking. More generally, we study a setting in which the feedback is
a subset U of at most k elements in V, and the loss is the sum of the
positions of those elements.
We present an algorithm of expected regret O(n3/2Tk) over a time
horizon of T steps with respect to the best single ranking in hindsight. This
improves previous algorithms and analyses either by a factor of either
Ω(k), a factor of Ω(logn) or by improving running
time from quadratic to O(nlogn) per round. We also prove a matching lower
bound. Our techniques also imply an improved regret bound for online rank
aggregation over the Spearman correlation measure, and to other more complex
ranking loss functions