189,713 research outputs found
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation
SimpleMTOD is a simple language model which recasts several sub-tasks in
multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is
built on a large-scale transformer-based auto-regressive architecture, which
has already proven to be successful in uni-modal task-oriented dialogues, and
effectively leverages transfer learning from pre-trained GPT-2. In-order to
capture the semantics of visual scenes, we introduce both local and
de-localized tokens for objects within a scene. De-localized tokens represent
the type of an object rather than the specific object itself and so possess a
consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art
BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0
test-std dataset while performing on par in other multimodal sub-tasks:
Disambiguation, Coreference Resolution, and Dialog State Tracking. This is
despite taking a minimalist approach for extracting visual (and non-visual)
information. In addition the model does not rely on task-specific architectural
changes such as classification heads
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
Virtual assistants such as Google Assistant, Alexa and Siri provide a
conversational interface to a large number of services and APIs spanning
multiple domains. Such systems need to support an ever-increasing number of
services with possibly overlapping functionality. Furthermore, some of these
services have little to no training data available. Existing public datasets
for task-oriented dialogue do not sufficiently capture these challenges since
they cover few domains and assume a single static ontology per domain. In this
work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing
over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds
the existing task-oriented dialogue corpora in scale, while also highlighting
the challenges associated with building large-scale virtual assistants. It
provides a challenging testbed for a number of tasks including language
understanding, slot filling, dialogue state tracking and response generation.
Along the same lines, we present a schema-guided paradigm for task-oriented
dialogue, in which predictions are made over a dynamic set of intents and
slots, provided as input, using their natural language descriptions. This
allows a single dialogue system to easily support a large number of services
and facilitates simple integration of new services without requiring additional
training data. Building upon the proposed paradigm, we release a model for
dialogue state tracking capable of zero-shot generalization to new APIs, while
remaining competitive in the regular setting.Comment: To appear at AAAI 202
Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation
Automated metrics such as BLEU are widely used in the machine translation
literature. They have also been used recently in the dialogue community for
evaluating dialogue response generation. However, previous work in dialogue
response generation has shown that these metrics do not correlate strongly with
human judgment in the non task-oriented dialogue setting. Task-oriented
dialogue responses are expressed on narrower domains and exhibit lower
diversity. It is thus reasonable to think that these automated metrics would
correlate well with human judgment in the task-oriented setting where the
generation task consists of translating dialogue acts into a sentence. We
conduct an empirical study to confirm whether this is the case. Our findings
indicate that these automated metrics have stronger correlation with human
judgments in the task-oriented setting compared to what has been observed in
the non task-oriented setting. We also observe that these metrics correlate
even better for datasets which provide multiple ground truth reference
sentences. In addition, we show that some of the currently available corpora
for task-oriented language generation can be solved with simple models and
advocate for more challenging datasets
- …