Search CORE

289 research outputs found

Diversify and Conquer: Bandits and Diversity for an Enhanced E-commerce Homepage Experience

Author: Jaiswal Sangeet
Jawaid Saif
Malayil Korah T
Vempati Sreekanth
Publication venue
Publication date: 25/09/2023
Field of study

In the realm of e-commerce, popular platforms utilize widgets to recommend advertisements and products to their users. However, the prevalence of mobile device usage on these platforms introduces a unique challenge due to the limited screen real estate available. Consequently, the positioning of relevant widgets becomes pivotal in capturing and maintaining customer engagement. Given the restricted screen size of mobile devices, widgets placed at the top of the interface are more prominently displayed and thus attract greater user attention. Conversely, widgets positioned further down the page require users to scroll, resulting in reduced visibility and subsequent lower impression rates. Therefore it becomes imperative to place relevant widgets on top. However, selecting relevant widgets to display is a challenging task as the widgets can be heterogeneous, widgets can be introduced or removed at any given time from the platform. In this work, we model the vertical widget reordering as a contextual multi-arm bandit problem with delayed batch feedback. The objective is to rank the vertical widgets in a personalized manner. We present a two-stage ranking framework that combines contextual bandits with a diversity layer to improve the overall ranking. We demonstrate its effectiveness through offline and online A/B results, conducted on proprietary data from Myntra, a major fashion e-commerce platform in India.Comment: Accepted in Proceedings of Fashionxrecys Workshop, 17th ACM Conference on Recommender Systems, 202

arXiv.org e-Print Archive

Reinforcement Learning-based Livestreaming E-commerce Recommendation System

Author: Hsiao Shun-Wen
Lin Yi-Ling
Tang Szu-Chi
Publication venue
Publication date: 03/01/2024
Field of study

ScholarSpace at University of Hawai'i at Manoa

Neural Interactive Collaborative Filtering

Author: Chapelle Olivier
Finn Chelsea
Gu Yulong
Hoyer Patrik O
Kingma Diederik P
Koch Gregory
Mnih Andriy
Qin Lijing
Rendle Steffen
Santoro Adam
Vartak Manasi
Vaswani Ashish
Zou Lixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/07/2020
Field of study

In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

Author: Ciosek Kamil
Lalmas Mounia
Maystre Lucas
McDonald Thomas M.
Russo Daniel
Publication venue
Publication date: 20/07/2023
Field of study

Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.Comment: Presented at the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23

arXiv.org e-Print Archive

Cold-Start Collaborative Filtering

Author: Zhao X
Publication venue: UCL (University College London)
Publication date: 28/01/2016
Field of study

Collaborative Filtering (CF) is a technique to generate personalised recommendations for a user from a collection of correlated preferences in the past. In general, the effectiveness of CF greatly depends on the amount of available information about the target user and the target item. The cold-start problem, which describes the difficulty of making recommendations when the users or the items are new, remains a great challenge for CF. Traditionally, this problem is tackled by resorting to an additional interview process to establish the user (item) profile before making any recommendations. During this process the user’s information need is not addressed. In this thesis, however, we argue that recommendations would be preferably provided right from the beginning. And the goal of solving the cold-start problem should be maximising the overall recommendation utility during all interactions with the recommender system. In other words, we should not distinguish between the information-gathering and recommendation-making phases, but seamlessly integrate them together. This mechanism naturally addresses the cold-start problem as any user (item) can immediately receive sequential recommendations without providing extra information beforehand. This thesis solves the cold-start problem in an interactive setting by focusing on four interconnected aspects. First, we consider a continuous sequential recommendation process with CF and relate it to the exploitation-exploration (EE) trade-off. By employing probabilistic matrix factorization, we obtain a structured decision space and are thus able to leverage several EE algorithms, such as Thompson sampling and upper confidence bounds, to select items. Second, we extend the sequential recommendation process to a batch mode where multiple recommendations are made at each interaction stage. We specifically discuss the case of two consecutive interaction stages, and model it with the partially observable Markov decision process (POMDP) to obtain its exact theoretical solution. Through an in-depth analysis of the POMDP value iteration solution, we identify that an exact solution can be abstracted as selecting users (items) that are not only highly relevant to the target according to the initial-stage information, but also highly correlated with other potential users (items) for the next stage. Third, we consider the intra-stage recommendation optimisation and focus on the problem of personalised item diversification. We reformulate the latent factor models using the mean-variance analysis from the portfolio theory in economics. The resulting portfolio ranking algorithm naturally captures the user’s interest range and the uncertainty of the user preference by employing the variance of the learned user latent factors, leading to a diversified item list adapted to the individual user. And, finally, we relate the diversification algorithm back to the interactive process by considering inter-stage joint portfolio diversification, where the recommendations are optimised jointly with the user’s past preference records

UCL Discovery