Search CORE

189,286 research outputs found

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Author: Agarwal Shubham
Dusek Ondrej
Konstas Ioannis
Rieser Verena
Publication venue
Publication date: 01/01/2018
Field of study

Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

Multimodal Speech Emotion Recognition Using Audio and Text

Author: Byun Seokhyun
Jung Kyomin
Yoon Seunghyun
Publication venue
Publication date: 10/10/2018
Field of study

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Learning Multimodal Word Representation via Dynamic Fusion Methods

Author: Wang Shaonan
Zhang Jiajun
Zong Chengqing
Publication venue
Publication date: 01/01/2018
Field of study

Multimodal models have been proven to outperform text-based models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose three novel dynamic fusion methods to assign importance weights to each modality, in which weights are learned under the weak supervision of word association pairs. The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.Comment: To be appear in AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The role of avatars in e-government interfaces

Author: C.F. Camerer
D.M. Rousseau
F. Fukuyama
J. Nielsen
R.M. Kramer
S.E. Colesca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This paper investigates the use of avatars to communicate live message in e-government interfaces. A comparative study is presented that evaluates the contribution of multimodal metaphors (including avatars) to the usability of interfaces for e-government and user trust. The communication metaphors evaluated included text, earcons, recorded speech and avatars. The experimental platform used for the experiment involved two interface versions with a sample of 30 users. The results demonstrated that the use of multimodal metaphors in an e-government interface can significantly contribute to enhancing the usability and increase trust of users to the e-government interface. A set of design guidelines, for the use of multimodal metaphors in e-government interfaces, was also produced

Crossref

UWL Repository