Search CORE

219 research outputs found

Learning to Speak and Act in a Fantasy Text Adventure Game

Author: Dinan Emily
Fan Angela
Humeau Samuel
Jain Saachi
Karamcheti Siddharth
Kiela Douwe
Rocktäschel Tim
Szlam Arthur
Urbanek Jack
Weston Jason
Publication venue
Publication date: 01/01/2019
Field of study

We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully

arXiv.org e-Print Archive

Crossref

UCL Discovery

From Knowledge Augmentation to Multi-tasking: Towards Human-like Dialogue Systems

Author: Young Tom
Publication venue
Publication date: 11/12/2022
Field of study

The goal of building dialogue agents that can converse with humans naturally has been a long-standing dream of researchers since the early days of artificial intelligence. The well-known Turing Test proposed to judge the ultimate validity of an artificial intelligence agent on the indistinguishability of its dialogues from humans'. It should come as no surprise that human-level dialogue systems are very challenging to build. But, while early effort on rule-based systems found limited success, the emergence of deep learning enabled great advance on this topic. In this thesis, we focus on methods that address the numerous issues that have been imposing the gap between artificial conversational agents and human-level interlocutors. These methods were proposed and experimented with in ways that were inspired by general state-of-the-art AI methodologies. But they also targeted the characteristics that dialogue systems possess.Comment: PhD thesi

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues

Author: Chen Nancy F.
Hoi Steven C. H.
Le Hung
Publication venue
Publication date: 01/03/2021
Field of study

Compared to traditional visual question answering, video-grounded dialogues require additional reasoning over dialogue context to answer questions in a multi-turn setting. Previous approaches to video-grounded dialogues mostly use dialogue context as a simple text input without modelling the inherent information flows at the turn level. In this paper, we propose a novel framework of Reasoning Paths in Dialogue Context (PDC). PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer. PDC model then learns to predict reasoning paths over this semantic graph. Our path prediction model predicts a path from the current turn through past dialogue turns that contain additional visual cues to answer the current question. Our reasoning model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer. Our experimental results demonstrate the effectiveness of our method and provide additional insights on how models use semantic dependencies in a dialogue context to retrieve visual cues.Comment: Accepted at ICLR (International Conference on Learning Representations) 202

arXiv.org e-Print Archive

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

Author: Li Jinpeng
Wang Yueqian
Wang Yuxuan
Zhao Dongyan
Zhao Xueliang
Zheng Zilong
Publication venue
Publication date: 30/05/2023
Field of study

Video-grounded dialogue understanding is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues. Most existing benchmarks treat both modalities the same as a frame-independent visual understanding task, while neglecting the intrinsic attributes in multimodal dialogues, such as scene and topic transitions. In this paper, we present Video-grounded Scene&Topic AwaRe dialogue (VSTAR) dataset, a large scale video-grounded dialogue understanding dataset based on 395 TV series. Based on VSTAR, we propose two benchmarks for video-grounded dialogue understanding: scene segmentation and topic segmentation, and one benchmark for video-grounded dialogue generation. Comprehensive experiments are performed on these benchmarks to demonstrate the importance of multimodal information and segments in video-grounded dialogue understanding and generation.Comment: To appear at ACL 202

arXiv.org e-Print Archive