'What are you referring to?' Evaluating the Ability of Multi-Modal
  Dialogue Models to Process Clarificational Exchanges

Chiyah-Garcia, Javier; Eshghi, Arash; Hastie, Helen; Suglia, Alessandro

'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

Authors: Javier Chiyah-Garcia
Arash Eshghi
Helen Hastie
Alessandro Suglia
Publication date: 28 July 2023
Publisher

Abstract

Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models. We use the SIMMC 2.0 dataset to evaluate the ability of different state-of-the-art model architectures to process CEs, with a metric that probes the contextual updates that arise from them in the model. We find that language-based models are able to encode simple multi-modal semantic information and process some CEs, excelling with those related to the dialogue history, whilst multi-modal models can use additional learning objectives to obtain disentangled object representations, which become crucial to handle complex referential ambiguities across modalities overall.Comment: Accepted at SIGDIAL'23 (upcoming). Repository with code and experiments available at https://github.com/JChiyah/what-are-you-referring-t

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.15554

Last time updated on 04/08/2023