225 research outputs found
Whole-Chain Recommendations
With the recent prevalence of Reinforcement Learning (RL), there have been
tremendous interests in developing RL-based recommender systems. In practical
recommendation sessions, users will sequentially access multiple scenarios,
such as the entrance pages and the item detail pages, and each scenario has its
specific characteristics. However, the majority of existing RL-based
recommender systems focus on optimizing one strategy for all scenarios or
separately optimizing each strategy, which could lead to sub-optimal overall
performance. In this paper, we study the recommendation problem with multiple
(consecutive) scenarios, i.e., whole-chain recommendations. We propose a
multi-agent RL-based approach (DeepChain), which can capture the sequential
correlation among different scenarios and jointly optimize multiple
recommendation strategies. To be specific, all recommender agents (RAs) share
the same memory of users' historical behaviors, and they work collaboratively
to maximize the overall reward of a session. Note that optimizing multiple
recommendation strategies jointly faces two challenges in the existing
model-free RL model - (i) it requires huge amounts of user behavior data, and
(ii) the distribution of reward (users' feedback) are extremely unbalanced. In
this paper, we introduce model-based RL techniques to reduce the training data
requirement and execute more accurate strategy updates. The experimental
results based on a real e-commerce platform demonstrate the effectiveness of
the proposed framework.Comment: 29th ACM International Conference on Information and Knowledge
Managemen
TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL
Transferring knowledge among various environments is important to efficiently
learn multiple tasks online. Most existing methods directly use the previously
learned models or previously learned optimal policies to learn new tasks.
However, these methods may be inefficient when the underlying models or optimal
policies are substantially different across tasks. In this paper, we propose
Template Learning (TempLe), the first PAC-MDP method for multi-task
reinforcement learning that could be applied to tasks with varying state/action
space. TempLe generates transition dynamics templates, abstractions of the
transition dynamics across tasks, to gain sample efficiency by extracting
similarities between tasks even when their underlying models or optimal
policies have limited commonalities. We present two algorithms for an "online"
and a "finite-model" setting respectively. We prove that our proposed TempLe
algorithms achieve much lower sample complexity than single-task learners or
state-of-the-art multi-task methods. We show via systematically designed
experiments that our TempLe method universally outperforms the state-of-the-art
multi-task methods (PAC-MDP or not) in various settings and regimes
Continuous Input Embedding Size Search For Recommender Systems
Latent factor models are the most popular backbones for today's recommender
systems owing to their prominent performance. Latent factor models represent
users and items as real-valued embedding vectors for pairwise similarity
computation, and all embeddings are traditionally restricted to a uniform size
that is relatively large (e.g., 256-dimensional). With the exponentially
expanding user base and item catalog in contemporary e-commerce, this design is
admittedly becoming memory-inefficient. To facilitate lightweight
recommendation, reinforcement learning (RL) has recently opened up
opportunities for identifying varying embedding sizes for different
users/items. However, challenged by search efficiency and learning an optimal
RL policy, existing RL-based methods are restricted to highly discrete,
predefined embedding size choices. This leads to a largely overlooked potential
of introducing finer granularity into embedding sizes to obtain better
recommendation effectiveness under a given memory budget. In this paper, we
propose continuous input embedding size search (CIESS), a novel RL-based method
that operates on a continuous search space with arbitrary embedding sizes to
choose from. In CIESS, we further present an innovative random walk-based
exploration strategy to allow the RL policy to efficiently explore more
candidate embedding sizes and converge to a better decision. CIESS is also
model-agnostic and hence generalizable to a variety of latent factor RSs,
whilst experiments on two real-world datasets have shown state-of-the-art
performance of CIESS under different memory budgets when paired with three
popular recommendation models.Comment: To appear in SIGIR'2
- …