2 research outputs found
Online Learning to Rank with Features
We introduce a new model for online ranking in which the click probability
factors into an examination and attractiveness function and the attractiveness
function is a linear function of a feature vector and an unknown parameter.
Only relatively mild assumptions are made on the examination function. A novel
algorithm for this setup is analysed, showing that the dependence on the number
of items is replaced by a dependence on the dimension, allowing the new
algorithm to handle a large number of items. When reduced to the orthogonal
case, the regret of the algorithm improves on the state-of-the-art
Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation
Online recommendation services recommend multiple commodities to users.
Nowadays, a considerable proportion of users visit e-commerce platforms by
mobile devices. Due to the limited screen size of mobile devices, positions of
items have a significant influence on clicks: 1) Higher positions lead to more
clicks for one commodity. 2) The 'pseudo-exposure' issue: Only a few
recommended items are shown at first glance and users need to slide the screen
to browse other items. Therefore, some recommended items ranked behind are not
viewed by users and it is not proper to treat this kind of items as negative
samples. While many works model the online recommendation as contextual bandit
problems, they rarely take the influence of positions into consideration and
thus the estimation of the reward function may be biased. In this paper, we aim
at addressing these two issues to improve the performance of online mobile
recommendation. Our contributions are four-fold. First, since we concern the
reward of a set of recommended items, we model the online recommendation as a
contextual combinatorial bandit problem and define the reward of a recommended
set. Second, we propose a novel contextual combinatorial bandit method called
UBM-LinUCB to address two issues related to positions by adopting the User
Browsing Model (UBM), a click model for web search. Third, we provide a formal
regret analysis and prove that our algorithm achieves sublinear regret
independent of the number of items. Finally, we evaluate our algorithm on two
real-world datasets by a novel unbiased estimator. An online experiment is also
implemented in Taobao, one of the most popular e-commerce platforms in the
world. Results on two CTR metrics show that our algorithm outperforms the other
contextual bandit algorithms