1 research outputs found
Can NLP Models 'Identify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?
Though state-of-the-art (SOTA) NLP systems have achieved remarkable
performance on a variety of language understanding tasks, they primarily focus
on questions that have a correct and a definitive answer. However, in
real-world applications, users often ask questions that don't have a definitive
answer. Incorrectly answering such questions certainly hampers a system's
reliability and trustworthiness. Can SOTA models accurately identify such
questions and provide a reasonable response?
To investigate the above question, we introduce QnotA, a dataset consisting
of five different categories of questions that don't have definitive answers.
Furthermore, for each QnotA instance, we also provide a corresponding QA
instance i.e. an alternate question that ''can be'' answered. With this data,
we formulate three evaluation tasks that test a system's ability to 'identify',
'distinguish', and 'justify' QnotA questions. Through comprehensive
experiments, we show that even SOTA models including GPT-3 and Flan T5 do not
fare well on these tasks and lack considerably behind the human performance
baseline. We conduct a thorough analysis which further leads to several
interesting findings. Overall, we believe our work and findings will encourage
and facilitate further research in this important area and help develop more
robust models.Comment: TrustNLP Workshop at ACL 202