26,021 research outputs found
Sequential Selection of Correlated Ads by POMDPs
Online advertising has become a key source of revenue for both web search
engines and online publishers. For them, the ability of allocating right ads to
right webpages is critical because any mismatched ads would not only harm web
users' satisfactions but also lower the ad income. In this paper, we study how
online publishers could optimally select ads to maximize their ad incomes over
time. The conventional offline, content-based matching between webpages and ads
is a fine start but cannot solve the problem completely because good matching
does not necessarily lead to good payoff. Moreover, with the limited display
impressions, we need to balance the need of selecting ads to learn true ad
payoffs (exploration) with that of allocating ads to generate high immediate
payoffs based on the current belief (exploitation). In this paper, we address
the problem by employing Partially observable Markov decision processes
(POMDPs) and discuss how to utilize the correlation of ads to improve the
efficiency of the exploration and increase ad incomes in a long run. Our
mathematical derivation shows that the belief states of correlated ads can be
naturally updated using a formula similar to collaborative filtering. To test
our model, a real world ad dataset from a major search engine is collected and
categorized. Experimenting over the data, we provide an analyse of the effect
of the underlying parameters, and demonstrate that our algorithms significantly
outperform other strong baselines
Thompson Sampling in Dynamic Systems for Contextual Bandit Problems
We consider the multiarm bandit problems in the timevarying dynamic system
for rich structural features. For the nonlinear dynamic model, we propose the
approximate inference for the posterior distributions based on Laplace
Approximation. For the context bandit problems, Thompson Sampling is adopted
based on the underlying posterior distributions of the parameters. More
specifically, we introduce the discount decays on the previous samples impact
and analyze the different decay rates with the underlying sample dynamics.
Consequently, the exploration and exploitation is adaptively tradeoff according
to the dynamics in the system.Comment: 22 pages, 10 figure
R-UCB: a Contextual Bandit Algorithm for Risk-Aware Recommender Systems
Mobile Context-Aware Recommender Systems can be naturally modelled as an
exploration/exploitation trade-off (exr/exp) problem, where the system has to
choose between maximizing its expected rewards dealing with its current
knowledge (exploitation) and learning more about the unknown user's preferences
to improve its knowledge (exploration). This problem has been addressed by the
reinforcement learning community but they do not consider the risk level of the
current user's situation, where it may be dangerous to recommend items the user
may not desire in her current situation if the risk level is high. We introduce
in this paper an algorithm named R-UCB that considers the risk level of the
user's situation to adaptively balance between exr and exp. The detailed
analysis of the experimental results reveals several important discoveries in
the exr/exp behaviour
Freshness-Aware Thompson Sampling
To follow the dynamicity of the user's content, researchers have recently
started to model interactions between users and the Context-Aware Recommender
Systems (CARS) as a bandit problem where the system needs to deal with
exploration and exploitation dilemma. In this sense, we propose to study the
freshness of the user's content in CARS through the bandit problem. We
introduce in this paper an algorithm named Freshness-Aware Thompson Sampling
(FA-TS) that manages the recommendation of fresh document according to the
user's risk of the situation. The intensive evaluation and the detailed
analysis of the experimental results reveals several important discoveries in
the exploration/exploitation (exr/exp) behaviour.Comment: 21st International Conference on Neural Information Processing. arXiv
admin note: text overlap with arXiv:1409.772
- …