9 research outputs found

    System Initiative Prediction for Multi-turn Conversational Information Seeking

    Get PDF
    Identifying the right moment for a system to take the initiative is essential to conversational information seeking (CIS). Existing studies have extensively studied the clarification need prediction task, i.e., predicting when to ask a clarifying question, however, it only covers one specific system-initiative action. We define the system initiative prediction (SIP) task as predicting whether a CIS system should take the initiative at the next turn. Our analysis reveals that for effective modeling of SIP, it is crucial to capture dependencies between adjacent user-system initiative-taking decisions. We propose to model SIP by CRFs. Due to their graphical nature, CRFs are effective in capturing such dependencies and have greater transparency than more complex methods, e.g., LLMs. Applying CRFs to SIP comes with two challenges: (i) CRFs need to be given the unobservable system utterance at the next turn, and (ii) they do not explicitly model multi-turn features. We model SIP as an input-incomplete sequence labeling problem and propose a multiturn system initiative predictor (MuSIc) that has (i) prior-posterior inter-utterance encoders to eliminate the need to be given the unobservable system utterance, and (ii) a multi-turn feature-aware CRF layer to incorporate multi-turn features into the dependencies between adjacent initiative-taking decisions. Experiments show that MuSIc outperforms LLM-based baselines including LLaMA, achieving state-of-the-art results on SIP. We also show the benefits of SIP on clarification need prediction and action prediction.</p

    Asking Clarifying Questions:To benefit or to disturb users in Web search?

    Get PDF
    Modern information-seeking systems are becoming more interactive, mainly through asking Clarifying Questions (CQs) to refine users’ information needs. System-generated CQs may be of different qualities. However, the impact of asking multiple CQs of different qualities in a search session remains underexplored. Given the multi-turn nature of conversational information-seeking sessions, it is critical to understand and measure the impact of CQs of different qualities, when they are posed in various orders. In this paper, we conduct a user study on CQ quality trajectories, i.e., asking CQs of different qualities in chronological order. We aim to investigate to what extent the trajectory of CQs of different qualities affects user search behavior and satisfaction, on both query-level and session-level. Our user study is conducted with 89 participants as search engine users. Participants are asked to complete a set of Web search tasks. We find that the trajectory of CQs does affect the way users interact with Search Engine Result Pages (SERPs), e.g., a preceding high-quality CQ prompts the depth users to interact with SERPs, while a preceding low-quality CQ prevents such interaction. Our study also demonstrates that asking follow-up high-quality CQs improves the low search performance and user satisfaction caused by earlier low-quality CQs. In addition, only showing high-quality CQs while hiding other CQs receives better gains with less effort. That is, always showing all CQs may be risky and low-quality CQs do disturb users. Based on observations from our user study, we further propose a transformer-based model to predict which CQs to ask, to avoid disturbing users. In short, our study provides insights into the effects of trajectory of asking CQs, and our results will be helpful in designing more effective and enjoyable search clarification systems.This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). This study is also supported by the NWO Smart Culture - Big Data/Digital Humanities (314-99-301), the NWO Innovational Research Incentives Scheme Vidi (016.Vidi.189.039), and the H2020- EU.3.4. - SOCIETAL CHALLENGES - Smart, Green, And Integrated Transport (814961)

    Optimizing Interactive Systems via Data-Driven Objectives

    Get PDF
    Effective optimization is essential for real-world interactive systems to provide a satisfactory user experience in response to changing user behavior. However, it is often challenging to find an objective to optimize for interactive systems (e.g., policy learning in task-oriented dialog systems). Generally, such objectives are manually crafted and rarely capture complex user needs in an accurate manner. We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System Optimizer (ISO), a novel algorithm that uses these inferred objectives for optimization. Our main contribution is a new general principled approach to optimizing interactive systems using data-driven objectives. We demonstrate the high effectiveness of ISO over several simulations.Comment: 30 pages, 12 figures. arXiv admin note: text overlap with arXiv:1802.0630

    Active learning in recommender systems: an unbiased and beyond-accuracy perspective

    Get PDF
    The items that a Recommender System (RS) suggests to its users are typically ones that it thinks the user will like and want to consume. An RS that is good at its job is of interest not only to its customers but also to service providers, so they can secure long-term customers and increase revenue. Thus, there is a challenge in building better recommender systems. One way to build a better RS is to improve the quality of the data on which the RS model is trained. An RS can use Active Learning (AL) to proactively acquire such data, with the goal of improving its model. The idea of AL for RS is to explicitly query the users, asking them to rate items which have not been rated yet. The items that a user will be asked to rate are known as the query items. Query items are different from recommendations. For example, the former may be items that the AL strategy predicts the user has already consumed, whereas the latter are ones that the RS predicts the user will like. In AL, query items are selected `intelligently' by an Active Learning strategy. Different AL strategies take different approaches to identify the query items. As with the evaluation of RSs, preliminary evaluation of AL strategies must be done offline. An offline evaluation can help to narrow the number of promising strategies that need to be evaluated in subsequent costly user trials and online experiments. Where the literature describes the offline evaluation of AL, the evaluation is typically quite narrow and incomplete: mostly, the focus is cold-start users; the impact of newly-acquired ratings on recommendation quality is usually measured only for those users who supplied those ratings; and impact is measured in terms of prediction accuracy or recommendation relevance. Furthermore, the traditional AL evaluation does not take into account the bias problem. As brought to light by recent RS literature, this is a problem that affects the offline evaluation of RS; it arises when a biased dataset is used to perform the evaluation. We argue that it is a problem that affects offline evaluation of AL strategies too. The main focus of this dissertation is on the design and evaluation of AL strategies for RSs. We first design novel methods (designated WTD and WTD_H) that `intervene' on a biased dataset to generate a new dataset with unbiased-like properties. Compared to the most similar approach proposed in the literature, we give empirical evidence, using two publicly-available datasets, that WTD and WTD_H are more effective at debiasing the evaluation of different recommender system models. We then propose a new framework for offline evaluation of AL for RS, which we believe facilitates a more authentic picture of the performances of the AL strategies under evaluation. In particular, our framework uses WTD or WTD_H to mitigate the bias, but it also assesses the impact of AL in a more comprehensive way than the traditional evaluation used in the literature. Our framework is more comprehensive in at least two ways. First, it segments users in more ways than is conventional and analyses the impact of AL on the different segments. Second, in the same way that RS evaluation has changed from a narrow focus on prediction accuracy and recommendation relevance to a wider consideration of so-called `beyond-accuracy' criteria (such as diversity, serendipity and novelty), our framework extends the evaluation of AL strategies to also cover `beyond-accuracy' criteria. Experimental results on two datasets show the effectiveness of our new framework. Finally, we propose some new AL strategies of our own. In particular, our new AL strategies, instead of focusing exclusively on prediction accuracy and recommendation relevance, are designed to also enhance `beyond-accuracy' criteria. We evaluate the new strategies using our more comprehensive evaluation framework
    corecore