4,532 research outputs found
Recommended from our members
A Quantum-like Multimodal Network Framework for Modeling Interaction Dynamics in Multiparty Conversational Sentiment Analysis
Sentiment analysis in conversations is an emerging yet challenging artificial intelligence (AI) task. It aims to discover the affective states and emotional changes of speakers involved in a conversation on the basis of their opinions, which are carried by different modalities of information (e.g., a video associated with a transcript). There exists a wealth of intra- and inter-utterance interaction information that affects the emotions of speakers in a complex and dynamic way. How to accurately and comprehensively model complicated interactions is the key problem of the field. To fill this gap, in this paper, we propose a novel and comprehensive framework for multimodal sentiment analysis in conversations, called a quantum-like multimodal network (QMN), which leverages the mathematical formalism of quantum theory (QT) and a long short-term memory (LSTM) network. Specifically, the QMN framework consists of a multimodal decision fusion approach inspired by quantum interference theory to capture the interactions within each utterance (i.e., the correlations between different modalities) and a strong-weak influence model inspired by quantum measurement theory to model the interactions between adjacent utterances (i.e., how one speaker influences another). Extensive experiments are conducted on two widely used conversational sentiment datasets: the MELD and IEMOCAP datasets. The experimental results show that our approach significantly outperforms a wide range of baselines and state-of-the-art models
DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition
This paper presents our pioneering effort for emotion recognition in
conversation (ERC) with pre-trained language models. Unlike regular documents,
conversational utterances appear alternately from different parties and are
usually organized as hierarchical structures in previous work. Such structures
are not conducive to the application of pre-trained language models such as
XLNet. To address this issue, we propose an all-in-one XLNet model, namely
DialogXL, with enhanced memory to store longer historical context and
dialog-aware self-attention to deal with the multi-party structures.
Specifically, we first modify the recurrence mechanism of XLNet from
segment-level to utterance-level in order to better model the conversational
data. Second, we introduce dialog-aware self-attention in replacement of the
vanilla self-attention in XLNet to capture useful intra- and inter-speaker
dependencies. Extensive experiments are conducted on four ERC benchmarks with
mainstream models presented for comparison. The experimental results show that
the proposed model outperforms the baselines on all the datasets. Several other
experiments such as ablation study and error analysis are also conducted and
the results confirm the role of the critical modules of DialogXL.Comment: Accepted by AAAI 2021 main conferenc
Suspense is the Key. Narratology, Cognitive Neurosciences and Computer Technology
L'articolo indaga i meccanismi neurocognitivi alla base della suspense sia in ambito letterario che filmico
Non-acted multi-view audio-visual dyadic Interactions. Project master thesis: multi-modal local and recurrent non-verbal emotion recognition in dyadic scenarios
Treballs finals del Mà ster de Fonaments de Ciència de Dades, Facultat de matemà tiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this master thesis is focused on the development of baseline emotion recognition system in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level with and without temporal information. For this reason, an exhaustive study of the state-of-the-art on emotion recognition techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition.
While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the architectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results.
A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (i.e. face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (i.e. network depth, fusion, etc.) has been demonstrated and studied, while some experiments to study the influence of the
temporal information are performed using some recurrent models for the spatiotemporal utterance level recognition of emotion.
Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work
Agents for educational games and simulations
This book consists mainly of revised papers that were presented at the Agents for Educational Games and Simulation (AEGS) workshop held on May 2, 2011, as part of the Autonomous Agents and MultiAgent Systems (AAMAS) conference in Taipei, Taiwan. The 12 full papers presented were carefully reviewed and selected from various submissions. The papers are organized topical sections on middleware applications, dialogues and learning, adaption and convergence, and agent applications
From Knowledge Augmentation to Multi-tasking: Towards Human-like Dialogue Systems
The goal of building dialogue agents that can converse with humans naturally
has been a long-standing dream of researchers since the early days of
artificial intelligence. The well-known Turing Test proposed to judge the
ultimate validity of an artificial intelligence agent on the
indistinguishability of its dialogues from humans'. It should come as no
surprise that human-level dialogue systems are very challenging to build. But,
while early effort on rule-based systems found limited success, the emergence
of deep learning enabled great advance on this topic.
In this thesis, we focus on methods that address the numerous issues that
have been imposing the gap between artificial conversational agents and
human-level interlocutors. These methods were proposed and experimented with in
ways that were inspired by general state-of-the-art AI methodologies. But they
also targeted the characteristics that dialogue systems possess.Comment: PhD thesi
Non-acted multi-view audio-visual dyadic interactions. Project non-verbal emotion recognition in dyadic scenarios and speaker segmentation
Treballs finals del Mà ster de Fonaments de Ciència de Dades, Facultat de matemà tiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this Master Thesis is focused on the development of baseline Emotion Recognition System in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level without temporal information. As well, a baseline Speaker Segmenta-
tion System has been developed to facilitate the annotation task. For this reason, an exhaustive study of the state-of-the-art on emotion recognition and speaker segmentation techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition and clustering for speaker aegmentation.
While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the ar-
chitectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results.
A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both
methods is done. In this work, the importance of preprocessing data (face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (network depth, fusion, etc.) has been demonstrated and studied.
On the other hand, the experiments for the speaker segmentation system are performed with a piece of audio from IEMOCAP database. In this work, the prerprocessing steps, the problems of an unsupervised system such as clustering and the feature representation are studied and discussed.
Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work
- …