15 research outputs found
Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval
Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share
many commonalities, including an agent who learns while interacts, a long-term
and complex goal, and an algorithm that explores and adapts. To successfully
apply RL methods to IIR, one challenge is to obtain sufficient relevance labels
to train the RL agents, which are infamously known as sample inefficient.
However, in a text corpus annotated for a given query, it is not the relevant
documents but the irrelevant documents that predominate. This would cause very
unbalanced training experiences for the agent and prevent it from learning any
policy that is effective. Our paper addresses this issue by using domain
randomization to synthesize more relevant documents for the training. Our
experimental results on the Text REtrieval Conference (TREC) Dynamic Domain
(DD) 2017 Track show that the proposed method is able to boost an RL agent's
learning effectiveness by 22\% in dealing with unseen situations.Comment: Accepted by SIGIR 202
Whole-Chain Recommendations
With the recent prevalence of Reinforcement Learning (RL), there have been
tremendous interests in developing RL-based recommender systems. In practical
recommendation sessions, users will sequentially access multiple scenarios,
such as the entrance pages and the item detail pages, and each scenario has its
specific characteristics. However, the majority of existing RL-based
recommender systems focus on optimizing one strategy for all scenarios or
separately optimizing each strategy, which could lead to sub-optimal overall
performance. In this paper, we study the recommendation problem with multiple
(consecutive) scenarios, i.e., whole-chain recommendations. We propose a
multi-agent RL-based approach (DeepChain), which can capture the sequential
correlation among different scenarios and jointly optimize multiple
recommendation strategies. To be specific, all recommender agents (RAs) share
the same memory of users' historical behaviors, and they work collaboratively
to maximize the overall reward of a session. Note that optimizing multiple
recommendation strategies jointly faces two challenges in the existing
model-free RL model - (i) it requires huge amounts of user behavior data, and
(ii) the distribution of reward (users' feedback) are extremely unbalanced. In
this paper, we introduce model-based RL techniques to reduce the training data
requirement and execute more accurate strategy updates. The experimental
results based on a real e-commerce platform demonstrate the effectiveness of
the proposed framework.Comment: 29th ACM International Conference on Information and Knowledge
Managemen
AutoAssign+: Automatic Shared Embedding Assignment in Streaming Recommendation
In the domain of streaming recommender systems, conventional methods for
addressing new user IDs or item IDs typically involve assigning initial ID
embeddings randomly. However, this practice results in two practical
challenges: (i) Items or users with limited interactive data may yield
suboptimal prediction performance. (ii) Embedding new IDs or low-frequency IDs
necessitates consistently expanding the embedding table, leading to unnecessary
memory consumption. In light of these concerns, we introduce a reinforcement
learning-driven framework, namely AutoAssign+, that facilitates Automatic
Shared Embedding Assignment Plus. To be specific, AutoAssign+ utilizes an
Identity Agent as an actor network, which plays a dual role: (i) Representing
low-frequency IDs field-wise with a small set of shared embeddings to enhance
the embedding initialization, and (ii) Dynamically determining which ID
features should be retained or eliminated in the embedding table. The policy of
the agent is optimized with the guidance of a critic network. To evaluate the
effectiveness of our approach, we perform extensive experiments on three
commonly used benchmark datasets. Our experiment results demonstrate that
AutoAssign+ is capable of significantly enhancing recommendation performance by
mitigating the cold-start problem. Furthermore, our framework yields a
reduction in memory usage of approximately 20-30%, verifying its practical
effectiveness and efficiency for streaming recommender systems