3 research outputs found
Scalable Neural Contextual Bandit for Recommender Systems
High-quality recommender systems ought to deliver both innovative and
relevant content through effective and exploratory interactions with users.
Yet, supervised learning-based neural networks, which form the backbone of many
existing recommender systems, only leverage recognized user interests, falling
short when it comes to efficiently uncovering unknown user preferences. While
there has been some progress with neural contextual bandit algorithms towards
enabling online exploration through neural networks, their onerous
computational demands hinder widespread adoption in real-world recommender
systems. In this work, we propose a scalable sample-efficient neural contextual
bandit algorithm for recommender systems. To do this, we design an epistemic
neural network architecture, Epistemic Neural Recommendation (ENR), that
enables Thompson sampling at a large scale. In two distinct large-scale
experiments with real-world tasks, ENR significantly boosts click-through rates
and user ratings by at least 9% and 6% respectively compared to
state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves
equivalent performance with at least 29% fewer user interactions compared to
the best-performing baseline algorithm. Remarkably, while accomplishing these
improvements, ENR demands orders of magnitude fewer computational resources
than neural contextual bandit baseline algorithms
Deep Exploration for Recommendation Systems
Modern recommendation systems ought to benefit by probing for and learning
from delayed feedback. Research has tended to focus on learning from a user's
response to a single recommendation. Such work, which leverages methods of
supervised and bandit learning, forgoes learning from the user's subsequent
behavior. Where past work has aimed to learn from subsequent behavior, there
has been a lack of effective methods for probing to elicit informative delayed
feedback. Effective exploration through probing for delayed feedback becomes
particularly challenging when rewards are sparse. To address this, we develop
deep exploration methods for recommendation systems. In particular, we
formulate recommendation as a sequential decision problem and demonstrate
benefits of deep exploration over single-step exploration. Our experiments are
carried out with high-fidelity industrial-grade simulators and establish large
improvements over existing algorithms
Collaborative Filtering as a Multi-Armed Bandit
International audienceRecommender Systems (RS) aim at suggesting to users one or several items in which they might have interest. Following the feedback they receive from the user, these systems have to adapt their model in order to improve future recommendations. The repetition of these steps defines the RS as a sequential process. This sequential aspect raises an exploration-exploitation dilemma, which is surprisingly rarely taken into account for RS without contextual information. In this paper we present an explore-exploit collaborative filtering RS, based on Matrix Factor-ization and Bandits algorithms. Using experiments on artificial and real datasets, we show the importance and practicability of using sequential approaches to perform recommendation. We also study the impact of the model update on both the quality and the computation time of the recommendation procedure