289 research outputs found
Diversify and Conquer: Bandits and Diversity for an Enhanced E-commerce Homepage Experience
In the realm of e-commerce, popular platforms utilize widgets to recommend
advertisements and products to their users. However, the prevalence of mobile
device usage on these platforms introduces a unique challenge due to the
limited screen real estate available. Consequently, the positioning of relevant
widgets becomes pivotal in capturing and maintaining customer engagement. Given
the restricted screen size of mobile devices, widgets placed at the top of the
interface are more prominently displayed and thus attract greater user
attention. Conversely, widgets positioned further down the page require users
to scroll, resulting in reduced visibility and subsequent lower impression
rates. Therefore it becomes imperative to place relevant widgets on top.
However, selecting relevant widgets to display is a challenging task as the
widgets can be heterogeneous, widgets can be introduced or removed at any given
time from the platform. In this work, we model the vertical widget reordering
as a contextual multi-arm bandit problem with delayed batch feedback. The
objective is to rank the vertical widgets in a personalized manner. We present
a two-stage ranking framework that combines contextual bandits with a diversity
layer to improve the overall ranking. We demonstrate its effectiveness through
offline and online A/B results, conducted on proprietary data from Myntra, a
major fashion e-commerce platform in India.Comment: Accepted in Proceedings of Fashionxrecys Workshop, 17th ACM
Conference on Recommender Systems, 202
Neural Interactive Collaborative Filtering
In this paper, we study collaborative filtering in an interactive setting, in
which the recommender agents iterate between making recommendations and
updating the user profile based on the interactive feedback. The most
challenging problem in this scenario is how to suggest items when the user
profile has not been well established, i.e., recommend for cold-start users or
warm-start users with taste drifting. Existing approaches either rely on overly
pessimistic linear exploration strategy or adopt meta-learning based algorithms
in a full exploitation way. In this work, to quickly catch up with the user's
interests, we propose to represent the exploration policy with a neural network
and directly learn it from the feedback data. Specifically, the exploration
policy is encoded in the weights of multi-channel stacked self-attention neural
networks and trained with efficient Q-learning by maximizing users' overall
satisfaction in the recommender systems. The key insight is that the satisfied
recommendations triggered by the exploration recommendation can be viewed as
the exploration bonus (delayed reward) for its contribution on improving the
quality of the user profile. Therefore, the proposed exploration policy, to
balance between learning the user profile and making accurate recommendations,
can be directly optimized by maximizing users' long-term satisfaction with
reinforcement learning. Extensive experiments and analysis conducted on three
benchmark collaborative filtering datasets have demonstrated the advantage of
our method over state-of-the-art methods
Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay
Recommender systems are a ubiquitous feature of online platforms.
Increasingly, they are explicitly tasked with increasing users' long-term
satisfaction. In this context, we study a content exploration task, which we
formalize as a multi-armed bandit problem with delayed rewards. We observe that
there is an apparent trade-off in choosing the learning signal: Waiting for the
full reward to become available might take several weeks, hurting the rate at
which learning happens, whereas measuring short-term proxy rewards reflects the
actual long-term goal only imperfectly. We address this challenge in two steps.
First, we develop a predictive model of delayed rewards that incorporates all
information obtained to date. Full observations as well as partial (short or
medium-term) outcomes are combined through a Bayesian filter to obtain a
probabilistic belief. Second, we devise a bandit algorithm that takes advantage
of this new predictive model. The algorithm quickly learns to identify content
aligned with long-term success by carefully balancing exploration and
exploitation. We apply our approach to a podcast recommendation problem, where
we seek to identify shows that users engage with repeatedly over two months. We
empirically validate that our approach results in substantially better
performance compared to approaches that either optimize for short-term proxies,
or wait for the long-term outcome to be fully realized.Comment: Presented at the 29th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD '23
Cold-Start Collaborative Filtering
Collaborative Filtering (CF) is a technique to generate personalised recommendations for a user from a collection of correlated preferences in the past. In general, the effectiveness of CF greatly depends on the amount of available information about the target user and the target item. The cold-start problem, which describes the difficulty of making recommendations when the users or the items are new, remains a great challenge for CF. Traditionally, this problem is tackled by resorting to an additional interview process to establish the user (item) profile before making any recommendations. During this process the user’s information need is not addressed. In this thesis, however, we argue that recommendations would be preferably provided right from the beginning. And the goal of solving the cold-start problem should be maximising the overall recommendation utility during all interactions with the recommender system. In other words, we should not distinguish between the information-gathering and recommendation-making phases, but seamlessly integrate them together. This mechanism naturally addresses the cold-start problem as any user (item) can immediately receive sequential recommendations without providing extra information beforehand. This thesis solves the cold-start problem in an interactive setting by focusing on four interconnected aspects. First, we consider a continuous sequential recommendation process with CF and relate it to the exploitation-exploration (EE) trade-off. By employing probabilistic matrix factorization, we obtain a structured decision space and are thus able to leverage several EE algorithms, such as Thompson sampling and upper confidence bounds, to select items. Second, we extend the sequential recommendation process to a batch mode where multiple recommendations are made at each interaction stage. We specifically discuss the case of two consecutive interaction stages, and model it with the partially observable Markov decision process (POMDP) to obtain its exact theoretical solution. Through an in-depth analysis of the POMDP value iteration solution, we identify that an exact solution can be abstracted as selecting users (items) that are not only highly relevant to the target according to the initial-stage information, but also highly correlated with other potential users (items) for the next stage. Third, we consider the intra-stage recommendation optimisation and focus on the problem of personalised item diversification. We reformulate the latent factor models using the mean-variance analysis from the portfolio theory in economics. The resulting portfolio ranking algorithm naturally captures the user’s interest range and the uncertainty of the user preference by employing the variance of the learned user latent factors, leading to a diversified item list adapted to the individual user. And, finally, we relate the diversification algorithm back to the interactive process by considering inter-stage joint portfolio diversification, where the recommendations are optimised jointly with the user’s past preference records
- …