Search CORE

960 research outputs found

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Author: Bhandari Jalaj
He Yuchen
Korenkevych Dmytro
Liu Fan
Nikulkov Alex
Xu Ruiyang
Zhu Zheqing
Publication venue
Publication date: 30/07/2023
Field of study

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics

arXiv.org e-Print Archive

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

Author: An Bo
Cai Qingpeng
Gai Kun
Jiang Peng
Liu Shuchang
Sun Shuo
Xue Wanqi
Xue Zhenghai
Zheng Dong
Publication venue
Publication date: 02/06/2023
Field of study

Current advances in recommender systems have been remarkably successful in optimizing immediate engagement. However, long-term user engagement, a more desirable performance metric, remains difficult to improve. Meanwhile, recent reinforcement learning (RL) algorithms have shown their effectiveness in a variety of long-term goal optimization tasks. For this reason, RL is widely considered as a promising framework for optimizing long-term user engagement in recommendation. Though promising, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult. To mitigate the problem, we propose a novel paradigm, recommender systems with human preferences (or Preference-based Recommender systems), which allows RL recommender systems to learn from preferences about users historical behaviors rather than explicitly defined rewards. Such preferences are easily accessible through techniques such as crowdsourcing, as they do not require any expert knowledge. With PrefRec, we can fully exploit the advantages of RL in optimizing long-term goals, while avoiding complex reward engineering. PrefRec uses the preferences to automatically train a reward function in an end-to-end manner. The reward function is then used to generate learning signals to train the recommendation policy. Furthermore, we design an effective optimization method for PrefRec, which uses an additional value function, expectile regression and reward model pre-training to improve the performance. We conduct experiments on a variety of long-term user engagement optimization tasks. The results show that PrefRec significantly outperforms previous state-of-the-art methods in all the tasks

arXiv.org e-Print Archive

Reinforcement recommendation with user multi-aspect preference

Author: Chen X
Du Y
Wang J
Xia L
Publication venue: ACM: Association for Computing Machinery
Publication date: 19/04/2021
Field of study

Formulating recommender system with reinforcement learning (RL) frameworks has attracted increasing attention from both academic and industry communities. While many promising results have been achieved, existing models mostly simulate the environment reward with a unified value, which may hinder the understanding of users' complex preferences and limit the model performance. In this paper, we consider how to model user multi-aspect preferences in the context of RL-based recommender system. More specifically, we base our model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. A major challenge for modeling user multi-aspect preferences lies in the fact that they may contradict with each other. To solve this problem, we introduce Pareto optimization into the DPG framework. We assign each aspect with a tailored critic, and all the critics share the same actor. The Pareto optimization is realized by a gradient-based method, which can be easily integrated into the actor and critic learning process. Based on the designed model, we theoretically analyze its gradient bias in the optimization process, and we design a weight-reuse mechanism to lower the upper bound of this bias, which is shown to be effective for improving the model performance. We conduct extensive experiments based on three real-world datasets to demonstrate our model's superiorities

UCL Discovery

AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement

Author: An Bo
Cai Qingpeng
Gai Kun
Hu Lantao
Jiang Peng
Xue Zhenghai
Yang Bin
Zuo Tianyou
Publication venue
Publication date: 05/10/2023
Field of study

Growing attention has been paid to Reinforcement Learning (RL) algorithms when optimizing long-term user engagement in sequential recommendation tasks. One challenge in large-scale online recommendation systems is the constant and complicated changes in users' behavior patterns, such as interaction rates and retention tendencies. When formulated as a Markov Decision Process (MDP), the dynamics and reward functions of the recommendation system are continuously affected by these changes. Existing RL algorithms for recommendation systems will suffer from distribution shift and struggle to adapt in such an MDP. In this paper, we introduce a novel paradigm called Adaptive Sequential Recommendation (AdaRec) to address this issue. AdaRec proposes a new distance-based representation loss to extract latent information from users' interaction trajectories. Such information reflects how RL policy fits to current user behavior patterns, and helps the policy to identify subtle changes in the recommendation system. To make rapid adaptation to these changes, AdaRec encourages exploration with the idea of optimism under uncertainty. The exploration is further guarded by zero-order action optimization to ensure stable recommendation quality in complicated environments. We conduct extensive empirical analyses in both simulator-based and live sequential recommendation tasks, where AdaRec exhibits superior long-term performance compared to all baseline algorithms.Comment: Preprint. Under Revie

arXiv.org e-Print Archive

Deep Reinforcement Learning in Recommender Systems

Author: Izquierdo Enfedaque Héctor
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/10/2021
Field of study

Recommender Systems aim to help customers find content of their interest by presenting them suggestions they are most likely to prefer. Reinforcement Learning, a Machine Learning paradigm where agents learn by interaction which actions to perform in an environment so as to maximize a reward, can be trained to give good recommendations. One of the problems when working with Reinforcement Learning algorithms is the dimensionality explosion, especially in the observation space. On the other hand, Industrial recommender systems deal with extremely large observation spaces. New Deep Reinforcement Learning algorithms can deal with this problem, but they are mainly focused on images. A new technique has been developed able to convert raw data into images, enabling DRL algorithms to be properly applied. This project addresses this line of investigation. The contributions of the project are: (1) defining a generalization of the Markov Decision Process formulation for Recommender Systems, (2) defining a way to express the observation as an image, and (3) demonstrating the use of both concepts by addressing a particular Recommender System case through Reinforcement Learning. Results show how the trained agents offer better recommendations than the arbitrary choice. However, the system does not achieve a great performance mainly due to the lack of interactions in the datase

UPCommons. Portal del coneixement obert de la UPC

Neural Interactive Collaborative Filtering

Author: Chapelle Olivier
Finn Chelsea
Gu Yulong
Hoyer Patrik O
Kingma Diederik P
Koch Gregory
Mnih Andriy
Qin Lijing
Rendle Steffen
Santoro Adam
Vartak Manasi
Vaswani Ashish
Zou Lixin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/07/2020
Field of study

In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Job Recommendation System Using Deep Reinforcement Learning (DRL)

Author: Mandalapu Srinivasa Rao
Narayanan B.
Putheti Sudhakar
Publication venue: Auricle Global Society of Education and Research
Publication date: 07/10/2023
Field of study

The rapid growth of online job portals and the increasing volume of job listings have made it challenging for job seekers to efficiently navigate through the vast number of available opportunities. Job recommendation systems play a crucial role in assisting users in finding relevant job opportunities based on their skills, preferences, and past experiences. This research paper proposes a job recommendation system that leverages deep learning techniques to enhance the accuracy and effectiveness of job recommendations. The system utilizes advanced algorithms to analyses user profiles, job descriptions, and historical data to generate personalized job recommendations. Experimental evaluations demonstrate the superiority of the proposed system compared to traditional recommendation methods, thereby improving the job search process for both job seekers and employers. This paper provides Job recommendation system using Deep Reinforcement Learning (DRL)

International Journal on Recent and Innovation Trends in Computing and Communication