16 research outputs found
Fine-Grained Session Recommendations in E-commerce using Deep Reinforcement Learning
Sustaining users' interest and keeping them engaged in the platform is very
important for the success of an e-commerce business. A session encompasses
different activities of a user between logging into the platform and logging
out or making a purchase. User activities in a session can be classified into
two groups: Known Intent and Unknown intent. Known intent activity pertains to
the session where the intent of a user to browse/purchase a specific product
can be easily captured. Whereas in unknown intent activity, the intent of the
user is not known. For example, consider the scenario where a user enters the
session to casually browse the products over the platform, similar to the
window shopping experience in the offline setting. While recommending similar
products is essential in the former, accurately understanding the intent and
recommending interesting products is essential in the latter setting in order
to retain a user. In this work, we focus primarily on the unknown intent
setting where our objective is to recommend a sequence of products to a user in
a session to sustain their interest, keep them engaged and possibly drive them
towards purchase. We formulate this problem in the framework of the Markov
Decision Process (MDP), a popular mathematical framework for sequential
decision making and solve it using Deep Reinforcement Learning (DRL)
techniques. However, training the next product recommendation is difficult in
the RL paradigm due to large variance in browse/purchase behavior of the users.
Therefore, we break the problem down into predicting various product
attributes, where a pattern/trend can be identified and exploited to build
accurate models. We show that the DRL agent provides better performance
compared to a greedy strategy
Non-invasive Self-attention for Side Information Fusion in Sequential Recommendation
Sequential recommender systems aim to model users' evolving interests from
their historical behaviors, and hence make customized time-relevant
recommendations. Compared with traditional models, deep learning approaches
such as CNN and RNN have achieved remarkable advancements in recommendation
tasks. Recently, the BERT framework also emerges as a promising method,
benefited from its self-attention mechanism in processing sequential data.
However, one limitation of the original BERT framework is that it only
considers one input source of the natural language tokens. It is still an open
question to leverage various types of information under the BERT framework.
Nonetheless, it is intuitively appealing to utilize other side information,
such as item category or tag, for more comprehensive depictions and better
recommendations. In our pilot experiments, we found naive approaches, which
directly fuse types of side information into the item embeddings, usually bring
very little or even negative effects. Therefore, in this paper, we propose the
NOninVasive self-attention mechanism (NOVA) to leverage side information
effectively under the BERT framework. NOVA makes use of side information to
generate better attention distribution, rather than directly altering the item
embedding, which may cause information overwhelming. We validate the NOVA-BERT
model on both public and commercial datasets, and our method can stably
outperform the state-of-the-art models with negligible computational overheads.Comment: Accepted at AAAI 202
Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay
Recommender systems are a ubiquitous feature of online platforms.
Increasingly, they are explicitly tasked with increasing users' long-term
satisfaction. In this context, we study a content exploration task, which we
formalize as a multi-armed bandit problem with delayed rewards. We observe that
there is an apparent trade-off in choosing the learning signal: Waiting for the
full reward to become available might take several weeks, hurting the rate at
which learning happens, whereas measuring short-term proxy rewards reflects the
actual long-term goal only imperfectly. We address this challenge in two steps.
First, we develop a predictive model of delayed rewards that incorporates all
information obtained to date. Full observations as well as partial (short or
medium-term) outcomes are combined through a Bayesian filter to obtain a
probabilistic belief. Second, we devise a bandit algorithm that takes advantage
of this new predictive model. The algorithm quickly learns to identify content
aligned with long-term success by carefully balancing exploration and
exploitation. We apply our approach to a podcast recommendation problem, where
we seek to identify shows that users engage with repeatedly over two months. We
empirically validate that our approach results in substantially better
performance compared to approaches that either optimize for short-term proxies,
or wait for the long-term outcome to be fully realized.Comment: Presented at the 29th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD '23