Search CORE

26,021 research outputs found

Sequential Selection of Correlated Ads by POMDPs

Author: Wang Jun
Yuan Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Online advertising has become a key source of revenue for both web search engines and online publishers. For them, the ability of allocating right ads to right webpages is critical because any mismatched ads would not only harm web users' satisfactions but also lower the ad income. In this paper, we study how online publishers could optimally select ads to maximize their ad incomes over time. The conventional offline, content-based matching between webpages and ads is a fine start but cannot solve the problem completely because good matching does not necessarily lead to good payoff. Moreover, with the limited display impressions, we need to balance the need of selecting ads to learn true ad payoffs (exploration) with that of allocating ads to generate high immediate payoffs based on the current belief (exploitation). In this paper, we address the problem by employing Partially observable Markov decision processes (POMDPs) and discuss how to utilize the correlation of ads to improve the efficiency of the exploration and increase ad incomes in a long run. Our mathematical derivation shows that the belief states of correlated ads can be naturally updated using a formula similar to collaborative filtering. To test our model, a real world ad dataset from a major search engine is collected and categorized. Experimenting over the data, we provide an analyse of the effect of the underlying parameters, and demonstrate that our algorithms significantly outperform other strong baselines

arXiv.org e-Print Archive

Crossref

Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

Author: Regan Amelia
Turner John
Xu Tianbing
Yu Yaming
Publication venue
Publication date: 16/10/2013
Field of study

We consider the multiarm bandit problems in the timevarying dynamic system for rich structural features. For the nonlinear dynamic model, we propose the approximate inference for the posterior distributions based on Laplace Approximation. For the context bandit problems, Thompson Sampling is adopted based on the underlying posterior distributions of the parameters. More specifically, we introduce the discount decays on the previous samples impact and analyze the different decay rates with the underlying sample dynamics. Consequently, the exploration and exploitation is adaptively tradeoff according to the dynamics in the system.Comment: 22 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

R-UCB: a Contextual Bandit Algorithm for Risk-Aware Recommender Systems

Author: Bouneffouf Djallel
Publication venue
Publication date: 05/08/2014
Field of study

Mobile Context-Aware Recommender Systems can be naturally modelled as an exploration/exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning community but they do not consider the risk level of the current user's situation, where it may be dangerous to recommend items the user may not desire in her current situation if the risk level is high. We introduce in this paper an algorithm named R-UCB that considers the risk level of the user's situation to adaptively balance between exr and exp. The detailed analysis of the experimental results reveals several important discoveries in the exr/exp behaviour

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

Hal-Diderot

Freshness-Aware Thompson Sampling

Author: D. Bouneffouf
D. Bouneffouf
D. Mladenic
W. Li
Publication venue
Publication date: 01/01/2014
Field of study

To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv admin note: text overlap with arXiv:1409.772

arXiv.org e-Print Archive

Crossref