28 research outputs found
Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness
Open-domain dialogue agents have vastly improved, but still confidently
hallucinate knowledge or express doubt when asked straightforward questions. In
this work, we analyze whether state-of-the-art chit-chat models can express
metacognition capabilities through their responses: does a verbalized
expression of doubt (or confidence) match the likelihood that the model's
answer is incorrect (or correct)? We find that these models are poorly
calibrated in this sense, yet we show that the representations within the
models can be used to accurately predict likelihood of correctness. By
incorporating these correctness predictions into the training of a controllable
generation model, we obtain a dialogue agent with greatly improved linguistic
calibration