21 research outputs found
Evaluating Mixed-initiative Conversational Search Systems via User Simulation
Clarifying the underlying user information need by asking clarifying
questions is an important feature of modern conversational search system.
However, evaluation of such systems through answering prompted clarifying
questions requires significant human effort, which can be time-consuming and
expensive. In this paper, we propose a conversational User Simulator, called
USi, for automatic evaluation of such conversational search systems. Given a
description of an information need, USi is capable of automatically answering
clarifying questions about the topic throughout the search session. Through a
set of experiments, including automated natural language generation metrics and
crowdsourcing studies, we show that responses generated by USi are both inline
with the underlying information need and comparable to human-generated answers.
Moreover, we make the first steps towards multi-turn interactions, where
conversational search systems asks multiple questions to the (simulated) user
with a goal of clarifying the user need. To this end, we expand on currently
available datasets for studying clarifying questions, i.e., Qulac and ClariQ,
by performing a crowdsourcing-based multi-turn data acquisition. We show that
our generative, GPT2-based model, is capable of providing accurate and natural
answers to unseen clarifying questions in the single-turn setting and discuss
capabilities of our model in the multi-turn setting. We provide the code, data,
and the pre-trained model to be used for further research on the topic
Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation
Research on conversational search has so far mostly focused on query
rewriting and multi-stage passage retrieval. However, synthesizing the top
retrieved passages into a complete, relevant, and concise response is still an
open challenge. Having snippet-level annotations of relevant passages would
enable both (1) the training of response generation models that are able to
ground answers in actual statements and (2) the automatic evaluation of the
generated responses in terms of completeness. In this paper, we address the
problem of collecting high-quality snippet-level answer annotations for two of
the TREC Conversational Assistance track datasets. To ensure quality, we first
perform a preliminary annotation study, employing different task designs,
crowdsourcing platforms, and workers with different qualifications. Based on
the outcomes of this study, we refine our annotation protocol before proceeding
with the full-scale data collection. Overall, we gather annotations for 1.8k
question-paragraph pairs, each annotated by three independent crowd workers.
The process of collecting data at this magnitude also led to multiple insights
about the problem that can inform the design of future response-generation
methods. This is an extended version of the article published with the same
title in the Proceedings of CIKM'23.Comment: Extended version of the paper that appeared in the Proceedings of the
32nd ACM International Conference on Information and Knowledge Management
(CIKM '23
Large Language Model Augmented Narrative Driven Recommendations
Narrative-driven recommendation (NDR) presents an information access problem
where users solicit recommendations with verbose descriptions of their
preferences and context, for example, travelers soliciting recommendations for
points of interest while describing their likes/dislikes and travel
circumstances. These requests are increasingly important with the rise of
natural language-based conversational interfaces for search and recommendation
systems. However, NDR lacks abundant training data for models, and current
platforms commonly do not support these requests. Fortunately, classical
user-item interaction datasets contain rich textual data, e.g., reviews, which
often describe user preferences and context - this may be used to bootstrap
training for NDR models. In this work, we explore using large language models
(LLMs) for data augmentation to train NDR models. We use LLMs for authoring
synthetic narrative queries from user-item interactions with few-shot prompting
and train retrieval models for NDR on synthetic queries and user-item
interaction data. Our experiments demonstrate that this is an effective
strategy for training small-parameter retrieval models that outperform other
retrieval and LLM baselines for narrative-driven recommendation.Comment: Pre-prin
Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation
Conversational AI is an emerging field of computer science that engages multiple research communities, from information retrieval to natural language processing to dialogue systems. Within this vast space, we focus on conversational informa tion access, a problem that is uniquely suited to be addressed by the information retrieval community. We argue that despite the significant research activity in this area, progress is mostly limited to component-level improvements. There remains a disconnect between current efforts and truly conversational information access systems. Apart from the inherently chal lenging nature of the problem, the lack of progress, in large part, can be attributed to the shortage of appropriate evaluation methodology and resources. This paper highlights challenges that render both offline and online evaluation methodologies unsuitable for this problem, and discusses the use of user simulation as a viable solution.publishedVersio