6,057 research outputs found

    CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning

    Full text link
    Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) poses new challenges in understanding the current user question, as each question needs to be interpreted within the dialogue context. Moreover, it can be expensive to re-train well-established retrievers such as search engines that are originally developed for non-conversational queries. To facilitate their use, we develop a query rewriting model CONQRR that rewrites a conversational question in the context into a standalone question. It is trained with a novel reward function to directly optimize towards retrieval using reinforcement learning and can be adapted to any off-the-shelf retriever. We show that CONQRR achieves state-of-the-art results on a recent open-domain CQA dataset containing conversations from three different sources, and is effective for two different off-the-shelf retrievers. Our extensive analysis also shows the robustness of CONQRR to out-of-domain dialogues as well as to zero query rewriting supervision

    On the Robustness of Question Rewriting Systems to Questions of Varying Hardness

    Full text link
    In conversational question answering (CQA), the task of question rewriting~(QR) in context aims to rewrite a context-dependent question into an equivalent self-contained question that gives the same answer. In this paper, we are interested in the robustness of a QR system to questions varying in rewriting hardness or difficulty. Since there is a lack of questions classified based on their rewriting hardness, we first propose a heuristic method to automatically classify questions into subsets of varying hardness, by measuring the discrepancy between a question and its rewrite. To find out what makes questions hard or easy for rewriting, we then conduct a human evaluation to annotate the rewriting hardness of questions. Finally, to enhance the robustness of QR systems to questions of varying hardness, we propose a novel learning framework for QR that first trains a QR model independently on each subset of questions of a certain level of hardness, then combines these QR models as one joint model for inference. Experimental results on two datasets show that our framework improves the overall performance compared to the baselines.Comment: ACL'22, main, long pape

    Learning to Select the Relevant History Turns in Conversational Question Answering

    Full text link
    The increasing demand for the web-based digital assistants has given a rapid rise in the interest of the Information Retrieval (IR) community towards the field of conversational question answering (ConvQA). However, one of the critical aspects of ConvQA is the effective selection of conversational history turns to answer the question at hand. The dependency between relevant history selection and correct answer prediction is an intriguing but under-explored area. The selected relevant context can better guide the system so as to where exactly in the passage to look for an answer. Irrelevant context, on the other hand, brings noise to the system, thereby resulting in a decline in the model's performance. In this paper, we propose a framework, DHS-ConvQA (Dynamic History Selection in Conversational Question Answering), that first generates the context and question entities for all the history turns, which are then pruned on the basis of similarity they share in common with the question at hand. We also propose an attention-based mechanism to re-rank the pruned terms based on their calculated weights of how useful they are in answering the question. In the end, we further aid the model by highlighting the terms in the re-ranked conversational history using a binary classification task and keeping the useful terms (predicted as 1) and ignoring the irrelevant terms (predicted as 0). We demonstrate the efficacy of our proposed framework with extensive experimental results on CANARD and QuAC -- the two popularly utilized datasets in ConvQA. We demonstrate that selecting relevant turns works better than rewriting the original question. We also investigate how adding the irrelevant history turns negatively impacts the model's performance and discuss the research challenges that demand more attention from the IR community

    Follow-up question handling in the IMIX and Ritel systems: A comparative study

    Get PDF
    One of the basic topics of question answering (QA) dialogue systems is how follow-up questions should be interpreted by a QA system. In this paper, we shall discuss our experience with the IMIX and Ritel systems, for both of which a follow-up question handling scheme has been developed, and corpora have been collected. These two systems are each other's opposites in many respects: IMIX is multimodal, non-factoid, black-box QA, while Ritel is speech, factoid, keyword-based QA. Nevertheless, we will show that they are quite comparable, and that it is fruitful to examine the similarities and differences. We shall look at how the systems are composed, and how real, non-expert, users interact with the systems. We shall also provide comparisons with systems from the literature where possible, and indicate where open issues lie and in what areas existing systems may be improved. We conclude that most systems have a common architecture with a set of common subtasks, in particular detecting follow-up questions and finding referents for them. We characterise these tasks using the typical techniques used for performing them, and data from our corpora. We also identify a special type of follow-up question, the discourse question, which is asked when the user is trying to understand an answer, and propose some basic methods for handling it

    Human-in-the-Loop Question Answering with Natural Language Interaction

    Get PDF
    Generalizing beyond the training examples is the primary goal of machine learning. In natural language processing (NLP), impressive models struggle to generalize when faced with test examples that differ from the training examples: e.g., in genre, domain, or language. I study interactive methods that overcome such limitations by seeking feedback from human users to successfully complete the task at hand and improve over time while on the job. Unlike previous work that adopts simple forms of feedback (e.g., labeling predictions as correct/wrong or answering yes/no clarification questions), I focus on using free-form natural language as the communication interface for providing feedback which can convey richer information and offer a more flexible interaction. An essential skill that language-based interactive systems should have is to understand user utterances in conversational contexts. I study conversational question answering (CQA) in which humans interact with a question answering (QA) system by asking a sequence of related questions. CQA requires models to link questions together to resolve the conversational dependencies between them such as coreference and ellipsis. I introduce question-in-context rewriting to reduce context-dependent conversational questions to independent stand-alone questions that can be answered with existing QA models. I collect a large dataset of human rewrites and I use it to evaluate a set of models for the question rewriting task. Next, I study semantic parsing in interactive settings in which users correct parsing errors using natural language feedback. Most existing work frames semantic parsing as a one-shot mapping task. I establish that the majority of parsing mistakes that recent neural text-to-SQL parsers make are minor. Hence, it is often feasible for humans to detect and suggest corrections for such mistakes if they have the opportunity to provide precise feedback. I describe an interactive text-to-SQL parsing system that enables users to inspect the inferred parses and correct any errors they find by providing feedback in free-form natural language. I construct SPLASH: a large dataset of SQL correction instances paired with a diverse set of human-authored natural language feedback utterances. Using SPLASH, I posed a new task: given a question paired with an initial erroneous SQL parse, to what extent can we correct the parse based on a provided natural language feedback? Then, I present NL-EDIT: a neural model for the correction task. NL-EDIT combines two key ideas: 1) interpreting the feedback in the context of the other elements of the interaction and, 2) explicitly generating edit operations to correct the initial query instead of re-generating the full query from scratch. I create a simple SQL editing language whose basic units are add/delete operations applied to different SQL clauses. I discuss evaluation methods that help understand the usefulness and limitations of semantic parse correction models. I conclude this thesis by identifying three broad research directions for further advancing collaborative human-computer NLP: (1) developing user-centered explanations, (2) designing and evaluating interaction mechanisms, and (3) learning from interactions

    Conversational Question Answering on Heterogeneous Sources

    Get PDF
    Conversational question answering (ConvQA) tackles sequential informationneeds where contexts in follow-up questions are left implicit. Current ConvQAsystems operate over homogeneous sources of information: either a knowledgebase (KB), or a text corpus, or a collection of tables. This paper addressesthe novel issue of jointly tapping into all of these together, this wayboosting answer coverage and confidence. We present CONVINSE, an end-to-endpipeline for ConvQA over heterogeneous sources, operating in three stages: i)learning an explicit structured representation of an incoming question and itsconversational context, ii) harnessing this frame-like representation touniformly capture relevant evidences from KB, text, and tables, and iii)running a fusion-in-decoder model to generate the answer. We construct andrelease the first benchmark, ConvMix, for ConvQA over heterogeneous sources,comprising 3000 real-user conversations with 16000 questions, along with entityannotations, completed question utterances, and question paraphrases.Experiments demonstrate the viability and advantages of our method, compared tostate-of-the-art baselines.<br

    Conversational Question Answering on Heterogeneous Sources

    Get PDF
    • …
    corecore