324 research outputs found
Carousel Personalization in Music Streaming Apps with Contextual Bandits
Media services providers, such as music streaming platforms, frequently
leverage swipeable carousels to recommend personalized content to their users.
However, selecting the most relevant items (albums, artists, playlists...) to
display in these carousels is a challenging task, as items are numerous and as
users have different preferences. In this paper, we model carousel
personalization as a contextual multi-armed bandit problem with multiple plays,
cascade-based updates and delayed batch feedback. We empirically show the
effectiveness of our framework at capturing characteristics of real-world
carousels by addressing a large-scale playlist recommendation task on a global
music streaming mobile app. Along with this paper, we publicly release
industrial data from our experiments, as well as an open-source environment to
simulate comparable carousel personalization learning problems.Comment: 14th ACM Conference on Recommender Systems (RecSys 2020, Best Short
Paper Candidate
Cascading Bandits for Large-Scale Recommendation Problems
Abstract Most recommender systems recommend a list of items. The user examines the list, from the first item to the last, and often chooses the first attractive item and does not examine the rest. This type of user behavior can be modeled by the cascade model. In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend K most attractive items from a large set of L candidate items. We propose two algorithms for solving this problem, which are based on the idea of linear generalization. The key idea in our solutions is that we learn a predictor of the attraction probabilities of items from their features, as opposing to learning the attraction probability of each item independently as in the existing work. This results in practical learning algorithms whose regret does not depend on the number of items L. We bound the regret of one algorithm and comprehensively evaluate the other on a range of recommendation problems. The algorithm performs well and outperforms all baselines
Optimising Human-AI Collaboration by Learning Convincing Explanations
Machine learning models are being increasingly deployed to take, or assist in
taking, complicated and high-impact decisions, from quasi-autonomous vehicles
to clinical decision support systems. This poses challenges, particularly when
models have hard-to-detect failure modes and are able to take actions without
oversight. In order to handle this challenge, we propose a method for a
collaborative system that remains safe by having a human ultimately making
decisions, while giving the model the best opportunity to convince and debate
them with interpretable explanations. However, the most helpful explanation
varies among individuals and may be inconsistent across stated preferences. To
this end we develop an algorithm, Ardent, to efficiently learn a ranking
through interaction and best assist humans complete a task. By utilising a
collaborative approach, we can ensure safety and improve performance while
addressing transparency and accountability concerns. Ardent enables efficient
and effective decision-making by adapting to individual preferences for
explanations, which we validate through extensive simulations alongside a user
study involving a challenging image classification task, demonstrating
consistent improvement over competing systems
Online Clustering of Bandits with Misspecified User Models
The contextual linear bandit is an important online learning problem where
given arm features, a learning agent selects an arm at each round to maximize
the cumulative rewards in the long run. A line of works, called the clustering
of bandits (CB), utilize the collaborative effect over user preferences and
have shown significant improvements over classic linear bandit algorithms.
However, existing CB algorithms require well-specified linear user models and
can fail when this critical assumption does not hold. Whether robust CB
algorithms can be designed for more practical scenarios with misspecified user
models remains an open problem. In this paper, we are the first to present the
important problem of clustering of bandits with misspecified user models
(CBMUM), where the expected rewards in user models can be perturbed away from
perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB
(representing the learned clustering structure with dynamic graph and sets,
respectively), that can accommodate the inaccurate user preference estimations
and erroneous clustering caused by model misspecifications. We prove regret
upper bounds of for our
algorithms under milder assumptions than previous CB works (notably, we move
past a restrictive technical assumption on the distribution of the arms), which
match the lower bound asymptotically in up to logarithmic factors, and also
match the state-of-the-art results in several degenerate cases. The techniques
in proving the regret caused by misclustering users are quite general and may
be of independent interest. Experiments on both synthetic and real-world data
show our outperformance over previous algorithms
Interactive social recommendation
National Research Foundation (NRF) Singapore under its International Research Centres in Singapore Funding Initiativ
- …