293 research outputs found
Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
A contextual bandit problem is studied in a highly non-stationary
environment, which is ubiquitous in various recommender systems due to the
time-varying interests of users. Two models with disjoint and hybrid payoffs
are considered to characterize the phenomenon that users' preferences towards
different items vary differently over time. In the disjoint payoff model, the
reward of playing an arm is determined by an arm-specific preference vector,
which is piecewise-stationary with asynchronous and distinct changes across
different arms. An efficient learning algorithm that is adaptive to abrupt
reward changes is proposed and theoretical regret analysis is provided to show
that a sublinear scaling of regret in the time length is achieved. The
algorithm is further extended to a more general setting with hybrid payoffs
where the reward of playing an arm is determined by both an arm-specific
preference vector and a joint coefficient vector shared by all arms. Empirical
experiments are conducted on real-world datasets to verify the advantages of
the proposed learning algorithms against baseline ones in both settings.Comment: Accepted by AAAI 2
A Contextual-Bandit Approach to Personalized News Article Recommendation
Personalized web services strive to adapt their services (advertisements,
news articles, etc) to individual users by making use of both content and user
information. Despite a few recent advances, this problem remains challenging
for at least two reasons. First, web service is featured with dynamically
changing pools of content, rendering traditional collaborative filtering
methods inapplicable. Second, the scale of most web services of practical
interest calls for solutions that are both fast in learning and computation.
In this work, we model personalized recommendation of news articles as a
contextual bandit problem, a principled approach in which a learning algorithm
sequentially selects articles to serve users based on contextual information
about the users and articles, while simultaneously adapting its
article-selection strategy based on user-click feedback to maximize total user
clicks.
The contributions of this work are three-fold. First, we propose a new,
general contextual bandit algorithm that is computationally efficient and well
motivated from learning theory. Second, we argue that any bandit algorithm can
be reliably evaluated offline using previously recorded random traffic.
Finally, using this offline evaluation method, we successfully applied our new
algorithm to a Yahoo! Front Page Today Module dataset containing over 33
million events. Results showed a 12.5% click lift compared to a standard
context-free bandit algorithm, and the advantage becomes even greater when data
gets more scarce.Comment: 10 pages, 5 figure
HyperBandit: Contextual Bandit with Hypernewtork for Time-Varying User Preferences in Streaming Recommendation
In real-world streaming recommender systems, user preferences often
dynamically change over time (e.g., a user may have different preferences
during weekdays and weekends). Existing bandit-based streaming recommendation
models only consider time as a timestamp, without explicitly modeling the
relationship between time variables and time-varying user preferences. This
leads to recommendation models that cannot quickly adapt to dynamic scenarios.
To address this issue, we propose a contextual bandit approach using
hypernetwork, called HyperBandit, which takes time features as input and
dynamically adjusts the recommendation model for time-varying user preferences.
Specifically, HyperBandit maintains a neural network capable of generating the
parameters for estimating time-varying rewards, taking into account the
correlation between time features and user preferences. Using the estimated
time-varying rewards, a bandit policy is employed to make online
recommendations by learning the latent item contexts. To meet the real-time
requirements in streaming recommendation scenarios, we have verified the
existence of a low-rank structure in the parameter matrix and utilize low-rank
factorization for efficient training. Theoretically, we demonstrate a sublinear
regret upper bound against the best policy. Extensive experiments on real-world
datasets show that the proposed HyperBandit consistently outperforms the
state-of-the-art baselines in terms of accumulated rewards
Diversify and Conquer: Bandits and Diversity for an Enhanced E-commerce Homepage Experience
In the realm of e-commerce, popular platforms utilize widgets to recommend
advertisements and products to their users. However, the prevalence of mobile
device usage on these platforms introduces a unique challenge due to the
limited screen real estate available. Consequently, the positioning of relevant
widgets becomes pivotal in capturing and maintaining customer engagement. Given
the restricted screen size of mobile devices, widgets placed at the top of the
interface are more prominently displayed and thus attract greater user
attention. Conversely, widgets positioned further down the page require users
to scroll, resulting in reduced visibility and subsequent lower impression
rates. Therefore it becomes imperative to place relevant widgets on top.
However, selecting relevant widgets to display is a challenging task as the
widgets can be heterogeneous, widgets can be introduced or removed at any given
time from the platform. In this work, we model the vertical widget reordering
as a contextual multi-arm bandit problem with delayed batch feedback. The
objective is to rank the vertical widgets in a personalized manner. We present
a two-stage ranking framework that combines contextual bandits with a diversity
layer to improve the overall ranking. We demonstrate its effectiveness through
offline and online A/B results, conducted on proprietary data from Myntra, a
major fashion e-commerce platform in India.Comment: Accepted in Proceedings of Fashionxrecys Workshop, 17th ACM
Conference on Recommender Systems, 202
- …