960 research outputs found
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Auction-based recommender systems are prevalent in online advertising
platforms, but they are typically optimized to allocate recommendation slots
based on immediate expected return metrics, neglecting the downstream effects
of recommendations on user behavior. In this study, we employ reinforcement
learning to optimize for long-term return metrics in an auction-based
recommender system. Utilizing temporal difference learning, a fundamental
reinforcement learning algorithm, we implement an one-step policy improvement
approach that biases the system towards recommendations with higher long-term
user engagement metrics. This optimizes value over long horizons while
maintaining compatibility with the auction framework. Our approach is grounded
in dynamic programming ideas which show that our method provably improves upon
the existing auction-based base policy. Through an online A/B test conducted on
an auction-based recommender system which handles billions of impressions and
users daily, we empirically establish that our proposed method outperforms the
current production system in terms of long-term user engagement metrics
PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement
Current advances in recommender systems have been remarkably successful in
optimizing immediate engagement. However, long-term user engagement, a more
desirable performance metric, remains difficult to improve. Meanwhile, recent
reinforcement learning (RL) algorithms have shown their effectiveness in a
variety of long-term goal optimization tasks. For this reason, RL is widely
considered as a promising framework for optimizing long-term user engagement in
recommendation. Though promising, the application of RL heavily relies on
well-designed rewards, but designing rewards related to long-term user
engagement is quite difficult. To mitigate the problem, we propose a novel
paradigm, recommender systems with human preferences (or Preference-based
Recommender systems), which allows RL recommender systems to learn from
preferences about users historical behaviors rather than explicitly defined
rewards. Such preferences are easily accessible through techniques such as
crowdsourcing, as they do not require any expert knowledge. With PrefRec, we
can fully exploit the advantages of RL in optimizing long-term goals, while
avoiding complex reward engineering. PrefRec uses the preferences to
automatically train a reward function in an end-to-end manner. The reward
function is then used to generate learning signals to train the recommendation
policy. Furthermore, we design an effective optimization method for PrefRec,
which uses an additional value function, expectile regression and reward model
pre-training to improve the performance. We conduct experiments on a variety of
long-term user engagement optimization tasks. The results show that PrefRec
significantly outperforms previous state-of-the-art methods in all the tasks
Reinforcement recommendation with user multi-aspect preference
Formulating recommender system with reinforcement learning (RL) frameworks has attracted increasing attention from both academic and industry communities. While many promising results have been achieved, existing models mostly simulate the environment reward with a unified value, which may hinder the understanding of users' complex preferences and limit the model performance. In this paper, we consider how to model user multi-aspect preferences in the context of RL-based recommender system. More specifically, we base our model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. A major challenge for modeling user multi-aspect preferences lies in the fact that they may contradict with each other. To solve this problem, we introduce Pareto optimization into the DPG framework. We assign each aspect with a tailored critic, and all the critics share the same actor. The Pareto optimization is realized by a gradient-based method, which can be easily integrated into the actor and critic learning process. Based on the designed model, we theoretically analyze its gradient bias in the optimization process, and we design a weight-reuse mechanism to lower the upper bound of this bias, which is shown to be effective for improving the model performance. We conduct extensive experiments based on three real-world datasets to demonstrate our model's superiorities
AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement
Growing attention has been paid to Reinforcement Learning (RL) algorithms
when optimizing long-term user engagement in sequential recommendation tasks.
One challenge in large-scale online recommendation systems is the constant and
complicated changes in users' behavior patterns, such as interaction rates and
retention tendencies. When formulated as a Markov Decision Process (MDP), the
dynamics and reward functions of the recommendation system are continuously
affected by these changes. Existing RL algorithms for recommendation systems
will suffer from distribution shift and struggle to adapt in such an MDP. In
this paper, we introduce a novel paradigm called Adaptive Sequential
Recommendation (AdaRec) to address this issue. AdaRec proposes a new
distance-based representation loss to extract latent information from users'
interaction trajectories. Such information reflects how RL policy fits to
current user behavior patterns, and helps the policy to identify subtle changes
in the recommendation system. To make rapid adaptation to these changes, AdaRec
encourages exploration with the idea of optimism under uncertainty. The
exploration is further guarded by zero-order action optimization to ensure
stable recommendation quality in complicated environments. We conduct extensive
empirical analyses in both simulator-based and live sequential recommendation
tasks, where AdaRec exhibits superior long-term performance compared to all
baseline algorithms.Comment: Preprint. Under Revie
Deep Reinforcement Learning in Recommender Systems
Recommender Systems aim to help customers find content of their interest by presenting them suggestions they are most likely to prefer. Reinforcement Learning, a Machine Learning paradigm where agents learn by interaction which actions to perform in an environment so as to maximize a reward, can be trained to give good recommendations. One of the problems when working with Reinforcement Learning algorithms is the dimensionality explosion, especially in the observation space. On the other hand, Industrial recommender systems deal with extremely large observation spaces. New Deep Reinforcement Learning algorithms can deal with this problem, but they are mainly focused on images. A new technique has been developed able to convert raw data into images, enabling DRL algorithms to be properly applied. This project addresses this line of investigation. The contributions of the project are: (1) defining a generalization of the Markov Decision Process formulation for Recommender Systems, (2) defining a way to express the observation as an image, and (3) demonstrating the use of both concepts by addressing a particular Recommender System case through Reinforcement Learning. Results show how the trained agents offer better recommendations than the arbitrary choice. However, the system does not achieve a great performance mainly due to the lack of interactions in the datase
Neural Interactive Collaborative Filtering
In this paper, we study collaborative filtering in an interactive setting, in
which the recommender agents iterate between making recommendations and
updating the user profile based on the interactive feedback. The most
challenging problem in this scenario is how to suggest items when the user
profile has not been well established, i.e., recommend for cold-start users or
warm-start users with taste drifting. Existing approaches either rely on overly
pessimistic linear exploration strategy or adopt meta-learning based algorithms
in a full exploitation way. In this work, to quickly catch up with the user's
interests, we propose to represent the exploration policy with a neural network
and directly learn it from the feedback data. Specifically, the exploration
policy is encoded in the weights of multi-channel stacked self-attention neural
networks and trained with efficient Q-learning by maximizing users' overall
satisfaction in the recommender systems. The key insight is that the satisfied
recommendations triggered by the exploration recommendation can be viewed as
the exploration bonus (delayed reward) for its contribution on improving the
quality of the user profile. Therefore, the proposed exploration policy, to
balance between learning the user profile and making accurate recommendations,
can be directly optimized by maximizing users' long-term satisfaction with
reinforcement learning. Extensive experiments and analysis conducted on three
benchmark collaborative filtering datasets have demonstrated the advantage of
our method over state-of-the-art methods
Job Recommendation System Using Deep Reinforcement Learning (DRL)
The rapid growth of online job portals and the increasing volume of job listings have made it challenging for job seekers to efficiently navigate through the vast number of available opportunities. Job recommendation systems play a crucial role in assisting users in finding relevant job opportunities based on their skills, preferences, and past experiences. This research paper proposes a job recommendation system that leverages deep learning techniques to enhance the accuracy and effectiveness of job recommendations. The system utilizes advanced algorithms to analyses user profiles, job descriptions, and historical data to generate personalized job recommendations. Experimental evaluations demonstrate the superiority of the proposed system compared to traditional recommendation methods, thereby improving the job search process for both job seekers and employers. This paper provides Job recommendation system using Deep Reinforcement Learning (DRL)
- …