15,922 research outputs found
A Review of Evaluation Techniques for Social Dialogue Systems
In contrast with goal-oriented dialogue, social dialogue has no clear measure
of task success. Consequently, evaluation of these systems is notoriously hard.
In this paper, we review current evaluation methods, focusing on automatic
metrics. We conclude that turn-based metrics often ignore the context and do
not account for the fact that several replies are valid, while end-of-dialogue
rewards are mainly hand-crafted. Both lack grounding in human perceptions.Comment: 2 page
Improving Context Modelling in Multimodal Dialogue Generation
In this work, we investigate the task of textual response generation in a
multimodal task-oriented dialogue system. Our work is based on the recently
released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion
domain. We introduce a multimodal extension to the Hierarchical Recurrent
Encoder-Decoder (HRED) model and show that this extension outperforms strong
baselines in terms of text-based similarity metrics. We also showcase the
shortcomings of current vision and language models by performing an error
analysis on our system's output
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Multimodal search-based dialogue is a challenging new task: It extends
visually grounded question answering systems into multi-turn conversations with
access to an external database. We address this new challenge by learning a
neural response generation system from the recently released Multimodal
Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded
multimodal conversational model where an encoded knowledge base (KB)
representation is appended to the decoder input. Our model substantially
outperforms strong baselines in terms of text-based similarity measures (over 9
BLEU points, 3 of which are solely due to the use of additional information
from the KB
A proposal for the development of adaptive spoken interfaces to access the Web
Spoken dialog systems have been proposed as a solution to facilitate a more natural human–machine interaction. In this paper, we propose a framework to model the user׳s intention during the dialog and adapt the dialog model dynamically to the user needs and preferences, thus developing more efficient, adapted, and usable spoken dialog systems. Our framework employs statistical models based on neural networks that take into account the history of the dialog up to the current dialog state in order to predict the user׳s intention and the next system response. We describe our proposal and detail its application in the Let׳s Go spoken dialog system.Work partially supported by Projects MINECO TEC2012-37832-
C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/
TIC-1485
Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems
In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad
How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
We seek to create agents that both act and communicate with other agents in
pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019)---a
large-scale crowd-sourced fantasy text-game---with a dataset of quests. These
contain natural language motivations paired with in-game goals and human
demonstrations; completing a quest might require dialogue or actions (or both).
We introduce a reinforcement learning system that (1) incorporates large-scale
language modeling-based and commonsense reasoning-based pre-training to imbue
the agent with relevant priors; and (2) leverages a factorized action space of
action commands and dialogue, balancing between the two. We conduct zero-shot
evaluations using held-out human expert demonstrations, showing that our agents
are able to act consistently and talk naturally with respect to their
motivations
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
- …