8,849 research outputs found
Teaching Machines to Ask Useful Clarification Questions
Inquiry is fundamental to communication, and machines cannot effectively collaborate with humans unless they can ask questions. Asking questions is also a natural way for machines to express uncertainty, a task of increasing importance in an automated society. In the field of natural language processing, despite decades of work on question answering, there is relatively little work in question asking. Moreover, most of the previous work has focused on generating reading comprehension style questions which are answerable from the provided text. The goal of my dissertation work, on the other hand, is to understand how can we teach machines to ask clarification questions that point at the missing information in a text. Primarily, we focus on two scenarios where we find such question asking to be useful: (1) clarification questions on posts found in community-driven technical support forums such as StackExchange (2) clarification questions on descriptions of products in e-retail platforms such as Amazon.
In this dissertation we claim that, given large amounts of previously asked questions in various contexts (within a particular scenario), we can build machine learning models that can ask useful questions in a new unseen context (within the same scenario). In order to validate this hypothesis, we firstly create two large datasets of context paired with clarification question (and answer) for the two scenarios of technical support and e-retail by automatically extracting these information from available datadumps of StackExchange and Amazon. Given these datasets, in our first line of research, we build a machine learning model that first extracts a set of candidate clarification questions and then ranks them such that a more useful question would be higher up in the ranking. Our model is inspired by the idea of expected value of perfect information: a good question is one whose expected answer will be useful. We hypothesize that by explicitly modeling the value added by an answer to a given context, our model can learn to identify more useful questions. We evaluate our model against expert human judgments on the StackExchange dataset and demonstrate significant improvements over controlled baselines.
In our second line of research, we build a machine learning model that learns to generate a new clarification question from scratch, instead of ranking previously seen questions.
We hypothesize that we can train our model to generate good clarification questions by incorporating the usefulness of an answer to the clarification question into the recent sequence-to-sequence based neural network approaches.
We develop a Generative Adversarial Network (GAN) where the generator is a sequence-to-sequence model and the discriminator is a utility function that models the value of updating the context with the answer to the clarification question.
We evaluate our model on our two datasets of StackExchange and Amazon, using both automatic metrics and human judgments of usefulness, specificity and relevance, showing that our approach outperforms both a retrieval-based model and ablations that exclude the utility model and the adversarial training.
We observe that our question generation model generates questions that range a wide spectrum of specificity to the given context.
We argue that generating questions at a desired level of specificity (to a given context) can be useful in many scenarios.
In our last line of research we, therefore, build a question generation model which given a context and a level of specificity (generic or specific), generates a question at that level of specificity.
We hypothesize that by providing the level of specificity of the question to our model during training time, it can learn patterns in the question that indicate the level of specificity and use those to generate questions at a desired level of specificity.
To automatically label the large number of questions in our training data with the level of specificity, we train a binary classifier which given a context and a question, predicts whether the question is specific (to the context) or generic.
We demonstrate the effectiveness of our specificity-controlled question generation model by evaluating it on the Amazon dataset using human judgements
Evaluating Mixed-initiative Conversational Search Systems via User Simulation
Clarifying the underlying user information need by asking clarifying
questions is an important feature of modern conversational search system.
However, evaluation of such systems through answering prompted clarifying
questions requires significant human effort, which can be time-consuming and
expensive. In this paper, we propose a conversational User Simulator, called
USi, for automatic evaluation of such conversational search systems. Given a
description of an information need, USi is capable of automatically answering
clarifying questions about the topic throughout the search session. Through a
set of experiments, including automated natural language generation metrics and
crowdsourcing studies, we show that responses generated by USi are both inline
with the underlying information need and comparable to human-generated answers.
Moreover, we make the first steps towards multi-turn interactions, where
conversational search systems asks multiple questions to the (simulated) user
with a goal of clarifying the user need. To this end, we expand on currently
available datasets for studying clarifying questions, i.e., Qulac and ClariQ,
by performing a crowdsourcing-based multi-turn data acquisition. We show that
our generative, GPT2-based model, is capable of providing accurate and natural
answers to unseen clarifying questions in the single-turn setting and discuss
capabilities of our model in the multi-turn setting. We provide the code, data,
and the pre-trained model to be used for further research on the topic
Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering
Recent developments in pre-trained neural language modeling have led to leaps
in accuracy on commonsense question-answering benchmarks. However, there is
increasing concern that models overfit to specific tasks, without learning to
utilize external knowledge or perform general semantic reasoning. In contrast,
zero-shot evaluations have shown promise as a more robust measure of a model's
general reasoning abilities. In this paper, we propose a novel neuro-symbolic
framework for zero-shot question answering across commonsense tasks. Guided by
a set of hypotheses, the framework studies how to transform various
pre-existing knowledge resources into a form that is most effective for
pre-training models. We vary the set of language models, training regimes,
knowledge sources, and data generation strategies, and measure their impact
across tasks. Extending on prior work, we devise and compare four constrained
distractor-sampling strategies. We provide empirical results across five
commonsense question-answering tasks with data generated from five external
knowledge resources. We show that, while an individual knowledge graph is
better suited for specific tasks, a global knowledge graph brings consistent
gains across different tasks. In addition, both preserving the structure of the
task as well as generating fair and informative questions help language models
learn more effectively.Comment: AAAI 202
Forensic science expertise for international criminal proceedings: an old problem, a new context and a pragmatic resolution
Expert witness testimony provides an important source of information for international criminal proceedings, and forensic science expertise from mass graves is no exception: findings from exhumations and examinations have
featured in the ad hoc tribunals’ trials and judgments. Whilst the issues surrounding the law-science relationship have been explored within the realm of national legal systems, the mixed system adopted by these tribunals presents an established discussion with a new context. Using forensic archaeology as an example, this article explores some theoretical underpinnings and practical realities surrounding the use of forensic science during international criminal investigations into mass graves before looking at how Trial Chambers aim to establish the relevance and credibility of forensic science evidence. As little guidance regarding admissibility of expert evidence is provided, it is through the case-specific legal process of cross-examination and presentation of counter-expertise that methodological issues are resolved. This, together with reliance on normative principles, is the pragmatic approach adopted to discern reliability of expert opinion
- …