Search CORE

8,823 research outputs found

Improving Context Modelling in Multimodal Dialogue Generation

Author: Agarwal Shubham
Dusek Ondrej
Konstas Ioannis
Rieser Verena
Publication venue
Publication date: 01/01/2018
Field of study

In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system's output

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Author: Agarwal Shubham
Dusek Ondrej
Konstas Ioannis
Rieser Verena
Publication venue
Publication date: 01/01/2018
Field of study

Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System

Author: Ekbal Asif
Firdaus Mauajama
Madasu Avinash
Publication venue
Publication date: 27/05/2023
Field of study

Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two critical components of every conversational system that handles the task of understanding the user by capturing the necessary information in the form of slots and generating an appropriate response in accordance with the extracted information. Recently, dialogue systems integrated with complementary information such as images, audio, or video have gained immense popularity. In this work, we propose an end-to-end framework with the capability to extract necessary slot values from the utterance and generate a coherent response, thereby assisting the user to achieve their desired goals in a multimodal dialogue system having both textual and visual information. The task of extracting the necessary information is dependent not only on the text but also on the visual cues present in the dialogue. Similarly, for the generation, the previous dialog context comprising multimodal information is significant for providing coherent and informative responses. We employ a multimodal hierarchical encoder using pre-trained DialoGPT and also exploit the knowledge base (Kb) to provide a stronger context for both the tasks. Finally, we design a slot attention mechanism to focus on the necessary information in a given utterance. Lastly, a decoder generates the corresponding response for the given dialogue context and the extracted slot values. Experimental results on the Multimodal Dialogue Dataset (MMD) show that the proposed framework outperforms the baselines approaches in both the tasks. The code is available at https://github.com/avinashsai/slot-gpt.Comment: Published in the journal Multimedia Tools and Application

arXiv.org e-Print Archive

Affect and believability in game characters:a review of the use of affective computing in games

Author: ElSayed Salma
King David J.
Publication venue
Publication date: 24/08/2017
Field of study

Virtual agents are important in many digital environments. Designing a character that highly engages users in terms of interaction is an intricate task constrained by many requirements. One aspect that has gained more attention recently is the effective dimension of the agent. Several studies have addressed the possibility of developing an affect-aware system for a better user experience. Particularly in games, including emotional and social features in NPCs adds depth to the characters, enriches interaction possibilities, and combined with the basic level of competence, creates a more appealing game. Design requirements for emotionally intelligent NPCs differ from general autonomous agents with the main goal being a stronger player-agent relationship as opposed to problem solving and goal assessment. Nevertheless, deploying an affective module into NPCs adds to the complexity of the architecture and constraints. In addition, using such composite NPC in games seems beyond current technology, despite some brave attempts. However, a MARPO-type modular architecture would seem a useful starting point for adding emotions

Abertay Research Portal

Digital literacies:Research briefing for the TLRP-TEL (Teaching and Learning Research Programme - Technology Enhanced Learning)

Author: Barton David
Gillen Julia
Publication venue: ESRC Teaching and Learning Research Programme
Publication date: 01/01/2010
Field of study

Lancaster E-Prints