106 research outputs found
Scalable Neural Contextual Bandit for Recommender Systems
High-quality recommender systems ought to deliver both innovative and
relevant content through effective and exploratory interactions with users.
Yet, supervised learning-based neural networks, which form the backbone of many
existing recommender systems, only leverage recognized user interests, falling
short when it comes to efficiently uncovering unknown user preferences. While
there has been some progress with neural contextual bandit algorithms towards
enabling online exploration through neural networks, their onerous
computational demands hinder widespread adoption in real-world recommender
systems. In this work, we propose a scalable sample-efficient neural contextual
bandit algorithm for recommender systems. To do this, we design an epistemic
neural network architecture, Epistemic Neural Recommendation (ENR), that
enables Thompson sampling at a large scale. In two distinct large-scale
experiments with real-world tasks, ENR significantly boosts click-through rates
and user ratings by at least 9% and 6% respectively compared to
state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves
equivalent performance with at least 29% fewer user interactions compared to
the best-performing baseline algorithm. Remarkably, while accomplishing these
improvements, ENR demands orders of magnitude fewer computational resources
than neural contextual bandit baseline algorithms
Empirical analysis of representation learning and exploration in neural kernel bandits
Neural bandits have been shown to provide an efficient solution to practical
sequential decision tasks that have nonlinear reward functions. The main
contributor to that success is approximate Bayesian inference, which enables
neural network (NN) training with uncertainty estimates. However, Bayesian NNs
often suffer from a prohibitive computational overhead or operate on a subset
of parameters. Alternatively, certain classes of infinite neural networks were
shown to directly correspond to Gaussian processes (GP) with neural kernels
(NK). NK-GPs provide accurate uncertainty estimates and can be trained faster
than most Bayesian NNs. We propose to guide common bandit policies with NK
distributions and show that NK bandits achieve state-of-the-art performance on
nonlinear structured data. Moreover, we propose a framework for measuring
independently the ability of a bandit algorithm to learn representations and
explore, and use it to analyze the impact of NK distributions w.r.t.~those two
aspects. We consider policies based on a GP and a Student's t-process (TP).
Furthermore, we study practical considerations, such as training frequency and
model partitioning. We believe our work will help better understand the impact
of utilizing NKs in applied settings.Comment: Extended version. Added a major experiment comparing NK distribution
w.r.t. exploration and exploitation. Submitted to ICLR 202
- …