88 research outputs found
Bandits Warm-up Cold Recommender Systems
We address the cold start problem in recommendation systems assuming no
contextual information is available neither about users, nor items. We consider
the case in which we only have access to a set of ratings of items by users.
Most of the existing works consider a batch setting, and use cross-validation
to tune parameters. The classical method consists in minimizing the root mean
square error over a training subset of the ratings which provides a
factorization of the matrix of ratings, interpreted as a latent representation
of items and users. Our contribution in this paper is 5-fold. First, we
explicit the issues raised by this kind of batch setting for users or items
with very few ratings. Then, we propose an online setting closer to the actual
use of recommender systems; this setting is inspired by the bandit framework.
The proposed methodology can be used to turn any recommender system dataset
(such as Netflix, MovieLens,...) into a sequential dataset. Then, we explicit a
strong and insightful link between contextual bandit algorithms and matrix
factorization; this leads us to a new algorithm that tackles the
exploration/exploitation dilemma associated to the cold start problem in a
strikingly new perspective. Finally, experimental evidence confirm that our
algorithm is effective in dealing with the cold start problem on publicly
available datasets. Overall, the goal of this paper is to bridge the gap
between recommender systems based on matrix factorizations and those based on
contextual bandits
Estrategias de calentamiento en bandidos multi-brazo para recomendación
Trabajo Fin de Máster en Investigación e Innovación en Inteligencia Computacional y
Sistemas InteractivosRecommender systems have become an essential piece of multiple online platforms such as streaming
services and e-commerce in the last years as they provide users with articles they may find interesting
and thus granting them a personalised experience. The recommendation problem has many opened
investigation lines. One of them is the topic we tackle in this work: the cold-start problem.
In the context of recommender systems the cold-start problem refers to the situation in which a
system does not have enough information to give proper suggestions to the user. The cold-start problem
often occurs because of the following three main reasons: the user to be recommended is new to the
system and thus there is no information about its likes, some of the items that are recommended have
been recently added to the system and they do not have users’ reviews, or the system is completely
new and there is no information about the users nor the items.
Classical recommendation techniques come from Machine learning and they understand recommendation as an static process in which the system provides suggestions to the user and the last rates
them. It is more convenient to understand recommendation as a cycle of constant interaction between
the user and the system and every time a user rates an item, the system uses it to learn from the
user. In that sense we can sacrifice immediate reward in order to earn information about the user and
improve long term reward. This schema establishes a balance between exploration (non-optimal recommendations to learn about the user) and exploitation (optimal recommendations to maximise the
reward). Techniques known as multi-armed bandits are used to get that balance between exploration
and exploitation and we propose them to tackle cold-start problem.
Our hypothesis is that an exploration in the first epochs of the recommendation cycle can lead to
an improvement in the reward during the latest epochs. To test this hypothesis we divide the recommendation loop in two phases: the warm-up, in which we follow a more exploratory approach to get
as much information as possible; and exploitation, in which the system uses the knowledge acquired
during the warm-up to maximise the reward. For this two phases we combine different recommendation
strategies, among which we consider both multi-armed bandits and classic algorithms. We evaluate
them offline in three datasets: CM100K (music), MovieLens1M (films) and Twitter. We also study how
the warm-up duration affects the exploitation phase. Results show that in two dataset (MovieLens and
Twitter) classical algorithms perform better during the exploitation phase in terms of recall after a mainly
exploratory warm-up phase
A Contextual-Bandit Approach to Personalized News Article Recommendation
Personalized web services strive to adapt their services (advertisements,
news articles, etc) to individual users by making use of both content and user
information. Despite a few recent advances, this problem remains challenging
for at least two reasons. First, web service is featured with dynamically
changing pools of content, rendering traditional collaborative filtering
methods inapplicable. Second, the scale of most web services of practical
interest calls for solutions that are both fast in learning and computation.
In this work, we model personalized recommendation of news articles as a
contextual bandit problem, a principled approach in which a learning algorithm
sequentially selects articles to serve users based on contextual information
about the users and articles, while simultaneously adapting its
article-selection strategy based on user-click feedback to maximize total user
clicks.
The contributions of this work are three-fold. First, we propose a new,
general contextual bandit algorithm that is computationally efficient and well
motivated from learning theory. Second, we argue that any bandit algorithm can
be reliably evaluated offline using previously recorded random traffic.
Finally, using this offline evaluation method, we successfully applied our new
algorithm to a Yahoo! Front Page Today Module dataset containing over 33
million events. Results showed a 12.5% click lift compared to a standard
context-free bandit algorithm, and the advantage becomes even greater when data
gets more scarce.Comment: 10 pages, 5 figure
Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start Users
Static recommendation methods like collaborative filtering suffer from the
inherent limitation of performing real-time personalization for cold-start
users. Online recommendation, e.g., multi-armed bandit approach, addresses this
limitation by interactively exploring user preference online and pursuing the
exploration-exploitation (EE) trade-off. However, existing bandit-based methods
model recommendation actions homogeneously. Specifically, they only consider
the items as the arms, being incapable of handling the item attributes, which
naturally provide interpretable information of user's current demands and can
effectively filter out undesired items. In this work, we consider the
conversational recommendation for cold-start users, where a system can both ask
the attributes from and recommend items to a user interactively. This important
scenario was studied in a recent work. However, it employs a hand-crafted
function to decide when to ask attributes or make recommendations. Such
separate modeling of attributes and items makes the effectiveness of the system
highly rely on the choice of the hand-crafted function, thus introducing
fragility to the system. To address this limitation, we seamlessly unify
attributes and items in the same arm space and achieve their EE trade-offs
automatically using the framework of Thompson Sampling. Our Conversational
Thompson Sampling (ConTS) model holistically solves all questions in
conversational recommendation by choosing the arm with the maximal reward to
play. Extensive experiments on three benchmark datasets show that ConTS
outperforms the state-of-the-art methods Conversational UCB (ConUCB) and
Estimation-Action-Reflection model in both metrics of success rate and average
number of conversation turns.Comment: TOIS 202
Neural Interactive Collaborative Filtering
In this paper, we study collaborative filtering in an interactive setting, in
which the recommender agents iterate between making recommendations and
updating the user profile based on the interactive feedback. The most
challenging problem in this scenario is how to suggest items when the user
profile has not been well established, i.e., recommend for cold-start users or
warm-start users with taste drifting. Existing approaches either rely on overly
pessimistic linear exploration strategy or adopt meta-learning based algorithms
in a full exploitation way. In this work, to quickly catch up with the user's
interests, we propose to represent the exploration policy with a neural network
and directly learn it from the feedback data. Specifically, the exploration
policy is encoded in the weights of multi-channel stacked self-attention neural
networks and trained with efficient Q-learning by maximizing users' overall
satisfaction in the recommender systems. The key insight is that the satisfied
recommendations triggered by the exploration recommendation can be viewed as
the exploration bonus (delayed reward) for its contribution on improving the
quality of the user profile. Therefore, the proposed exploration policy, to
balance between learning the user profile and making accurate recommendations,
can be directly optimized by maximizing users' long-term satisfaction with
reinforcement learning. Extensive experiments and analysis conducted on three
benchmark collaborative filtering datasets have demonstrated the advantage of
our method over state-of-the-art methods
Diversify and Conquer: Bandits and Diversity for an Enhanced E-commerce Homepage Experience
In the realm of e-commerce, popular platforms utilize widgets to recommend
advertisements and products to their users. However, the prevalence of mobile
device usage on these platforms introduces a unique challenge due to the
limited screen real estate available. Consequently, the positioning of relevant
widgets becomes pivotal in capturing and maintaining customer engagement. Given
the restricted screen size of mobile devices, widgets placed at the top of the
interface are more prominently displayed and thus attract greater user
attention. Conversely, widgets positioned further down the page require users
to scroll, resulting in reduced visibility and subsequent lower impression
rates. Therefore it becomes imperative to place relevant widgets on top.
However, selecting relevant widgets to display is a challenging task as the
widgets can be heterogeneous, widgets can be introduced or removed at any given
time from the platform. In this work, we model the vertical widget reordering
as a contextual multi-arm bandit problem with delayed batch feedback. The
objective is to rank the vertical widgets in a personalized manner. We present
a two-stage ranking framework that combines contextual bandits with a diversity
layer to improve the overall ranking. We demonstrate its effectiveness through
offline and online A/B results, conducted on proprietary data from Myntra, a
major fashion e-commerce platform in India.Comment: Accepted in Proceedings of Fashionxrecys Workshop, 17th ACM
Conference on Recommender Systems, 202
- …