5,431 research outputs found
Chain-of-Choice Hierarchical Policy Learning for Conversational Recommendation
Conversational Recommender Systems (CRS) illuminate user preferences via
multi-round interactive dialogues, ultimately navigating towards precise and
satisfactory recommendations. However, contemporary CRS are limited to
inquiring binary or multi-choice questions based on a single attribute type
(e.g., color) per round, which causes excessive rounds of interaction and
diminishes the user's experience. To address this, we propose a more realistic
and efficient conversational recommendation problem setting, called
Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), which
enables CRS to inquire about multi-choice questions covering multiple types of
attributes in each round, thereby improving interactive efficiency. Moreover,
by formulating MTAMCR as a hierarchical reinforcement learning task, we propose
a Chain-of-Choice Hierarchical Policy Learning (CoCHPL) framework to enhance
both the questioning efficiency and recommendation effectiveness in MTAMCR.
Specifically, a long-term policy over options (i.e., ask or recommend)
determines the action type, while two short-term intra-option policies
sequentially generate the chain of attributes or items through multi-step
reasoning and selection, optimizing the diversity and interdependence of
questioning attributes. Finally, extensive experiments on four benchmarks
demonstrate the superior performance of CoCHPL over prevailing state-of-the-art
methods.Comment: Release with source cod
A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems
Conversational recommender systems (CRS) generate recommendations through an
interactive process. However, not all CRS approaches use human conversations as
their source of interaction data; the majority of prior CRS work simulates
interactions by exchanging entity-level information. As a result, claims of
prior CRS work do not generalise to real-world settings where conversations
take unexpected turns, or where conversational and intent understanding is not
perfect. To tackle this challenge, the research community has started to
examine holistic CRS, which are trained using conversational data collected
from real-world scenarios. Despite their emergence, such holistic approaches
are under-explored.
We present a comprehensive survey of holistic CRS methods by summarizing the
literature in a structured manner. Our survey recognises holistic CRS
approaches as having three components: 1) a backbone language model, the
optional use of 2) external knowledge, and/or 3) external guidance. We also
give a detailed analysis of CRS datasets and evaluation methods in real
application scenarios. We offer our insight as to the current challenges of
holistic CRS and possible future trends.Comment: Accepted by 5th KaRS Workshop @ ACM RecSys 2023, 8 page
Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation
Conversational recommendation systems (CRS) effectively address information
asymmetry by dynamically eliciting user preferences through multi-turn
interactions. Existing CRS widely assumes that users have clear preferences.
Under this assumption, the agent will completely trust the user feedback and
treat the accepted or rejected signals as strong indicators to filter items and
reduce the candidate space, which may lead to the problem of over-filtering.
However, in reality, users' preferences are often vague and volatile, with
uncertainty about their desires and changing decisions during interactions.
To address this issue, we introduce a novel scenario called Vague Preference
Multi-round Conversational Recommendation (VPMCR), which considers users' vague
and volatile preferences in CRS.VPMCR employs a soft estimation mechanism to
assign a non-zero confidence score for all candidate items to be displayed,
naturally avoiding the over-filtering problem. In the VPMCR setting, we
introduce an solution called Adaptive Vague Preference Policy Learning (AVPPL),
which consists of two main components: Uncertainty-aware Soft Estimation (USE)
and Uncertainty-aware Policy Learning (UPL). USE estimates the uncertainty of
users' vague feedback and captures their dynamic preferences using a
choice-based preferences extraction module and a time-aware decaying strategy.
UPL leverages the preference distribution estimated by USE to guide the
conversation and adapt to changes in users' preferences to make recommendations
or ask for attributes.
Our extensive experiments demonstrate the effectiveness of our method in the
VPMCR scenario, highlighting its potential for practical applications and
improving the overall performance and applicability of CRS in real-world
settings, particularly for users with vague or dynamic preferences
Personalized Memory Transfer for Conversational Recommendation Systems
Dialogue systems are becoming an increasingly common part of many users\u27 daily routines. Natural language serves as a convenient interface to express our preferences with the underlying systems. In this work, we implement a full-fledged Conversational Recommendation System, mainly focusing on learning user preferences through online conversations. Compared to the traditional collaborative filtering setting where feedback is provided quantitatively, conversational users may only indicate their preferences at a high level with inexact item mentions in the form of natural language chit-chat. This makes it harder for the system to correctly interpret user intent and in turn provide useful recommendations to the user. To tackle the ambiguities in natural language conversations, we propose Personalized Memory Transfer (PMT) which learns a personalized model in an online manner by leveraging a key-value memory structure to distill user feedback directly from conversations. This memory structure enables the integration of prior knowledge to transfer existing item representations/preferences and natural language representations. We also implement a retrieval based response generation module, where the system in addition to recommending items to the user, also responds to the user, either to elicit more information regarding the user intent or just for a casual chit-chat. The experiments were conducted on two public datasets and the results demonstrate the effectiveness of the proposed approach
Aligning Recommendation and Conversation via Dual Imitation
Human conversations of recommendation naturally involve the shift of
interests which can align the recommendation actions and conversation process
to make accurate recommendations with rich explanations. However, existing
conversational recommendation systems (CRS) ignore the advantage of user
interest shift in connecting recommendation and conversation, which leads to an
ineffective loose coupling structure of CRS. To address this issue, by modeling
the recommendation actions as recommendation paths in a knowledge graph (KG),
we propose DICR (Dual Imitation for Conversational Recommendation), which
designs a dual imitation to explicitly align the recommendation paths and user
interest shift paths in a recommendation module and a conversation module,
respectively. By exchanging alignment signals, DICR achieves bidirectional
promotion between recommendation and conversation modules and generates
high-quality responses with accurate recommendations and coherent explanations.
Experiments demonstrate that DICR outperforms the state-of-the-art models on
recommendation and conversation performance with automatic, human, and novel
explainability metrics.Comment: EMNLP 202
TREA: Tree-Structure Reasoning Schema for Conversational Recommendation
Conversational recommender systems (CRS) aim to timely trace the dynamic
interests of users through dialogues and generate relevant responses for item
recommendations. Recently, various external knowledge bases (especially
knowledge graphs) are incorporated into CRS to enhance the understanding of
conversation contexts. However, recent reasoning-based models heavily rely on
simplified structures such as linear structures or fixed-hierarchical
structures for causality reasoning, hence they cannot fully figure out
sophisticated relationships among utterances with external knowledge. To
address this, we propose a novel Tree structure Reasoning schEmA named TREA.
TREA constructs a multi-hierarchical scalable tree as the reasoning structure
to clarify the causal relationships between mentioned entities, and fully
utilizes historical conversations to generate more reasonable and suitable
responses for recommended results. Extensive experiments on two public CRS
datasets have demonstrated the effectiveness of our approach.Comment: Accepted by ACL2023 main conferenc
Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems
Conversational Recommender Systems (CRS) actively elicit user preferences to
generate adaptive recommendations. Mainstream reinforcement learning-based CRS
solutions heavily rely on handcrafted reward functions, which may not be
aligned with user intent in CRS tasks. Therefore, the design of task-specific
rewards is critical to facilitate CRS policy learning, which remains largely
under-explored in the literature. In this work, we propose a novel approach to
address this challenge by learning intrinsic rewards from interactions with
users. Specifically, we formulate intrinsic reward learning as a
multi-objective bi-level optimization problem. The inner level optimizes the
CRS policy augmented by the learned intrinsic rewards, while the outer level
drives the intrinsic rewards to optimize two CRS-specific objectives:
maximizing the success rate and minimizing the number of turns to reach a
successful recommendation in conversations. To evaluate the effectiveness of
our approach, we conduct extensive experiments on three public CRS benchmarks.
The results show that our algorithm significantly improves CRS performance by
exploiting informative learned intrinsic rewards.Comment: 11 page
- …