398 research outputs found
Multi-attention Recurrent Network for Human Communication Comprehension
Human face-to-face communication is a complex multimodal signal. We use words
(language modality), gestures (vision modality) and changes in tone (acoustic
modality) to convey our intentions. Humans easily process and understand
face-to-face communication, however, comprehending this form of communication
remains a significant challenge for Artificial Intelligence (AI). AI must
understand each modality and the interactions between them that shape human
communication. In this paper, we present a novel neural architecture for
understanding human communication called the Multi-attention Recurrent Network
(MARN). The main strength of our model comes from discovering interactions
between modalities through time using a neural component called the
Multi-attention Block (MAB) and storing them in the hybrid memory of a
recurrent component called the Long-short Term Hybrid Memory (LSTHM). We
perform extensive comparisons on six publicly available datasets for multimodal
sentiment analysis, speaker trait recognition and emotion recognition. MARN
shows state-of-the-art performance on all the datasets.Comment: AAAI 2018 Oral Presentatio
Multimodal sentiment analysis in real-life videos
This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target.
The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far.
This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level.
The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated.
A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above.
The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos
Multimodal Language Analysis with Recurrent Multistage Fusion
Computational modeling of human multimodal language is an emerging research
area in natural language processing spanning the language, visual and acoustic
modalities. Comprehending multimodal language requires modeling not only the
interactions within each modality (intra-modal interactions) but more
importantly the interactions between modalities (cross-modal interactions). In
this paper, we propose the Recurrent Multistage Fusion Network (RMFN) which
decomposes the fusion problem into multiple stages, each of them focused on a
subset of multimodal signals for specialized, effective fusion. Cross-modal
interactions are modeled using this multistage fusion approach which builds
upon intermediate representations of previous stages. Temporal and intra-modal
interactions are modeled by integrating our proposed fusion approach with a
system of recurrent neural networks. The RMFN displays state-of-the-art
performance in modeling human multimodal language across three public datasets
relating to multimodal sentiment analysis, emotion recognition, and speaker
traits recognition. We provide visualizations to show that each stage of fusion
focuses on a different subset of multimodal signals, learning increasingly
discriminative multimodal representations.Comment: EMNLP 201
Predicting epileptic seizures with a stacked long short-term memory network
Despite advancements, seizure detection algorithms are trained using only the data recorded frompast epileptic seizures. This one-dimensional approach has led to an excessive false detection rate,where common movements are incorrectly classified. Therefore, a new method of detection isrequired that can distinguish between the movements observed during a generalized tonic-clonic(GTC) seizure and common everyday activities. For this study, eight healthy participants and twodiagnosed with epilepsy simulated a series of activities that share a similar set of spatialcoordinates with an epileptic seizure. We then trained a stacked, long short-term memory (LSTM)network to classify the different activities. Results show that our network successfullydifferentiated the types of movement, with an accuracy score of 94.45%. These findings present amore sophisticated method of detection that correlates a wearers movement against 12 seizurerelated activities prior to formulating a prediction
Text Classification: A Review, Empirical, and Experimental Evaluation
The explosive and widespread growth of data necessitates the use of text
classification to extract crucial information from vast amounts of data.
Consequently, there has been a surge of research in both classical and deep
learning text classification methods. Despite the numerous methods proposed in
the literature, there is still a pressing need for a comprehensive and
up-to-date survey. Existing survey papers categorize algorithms for text
classification into broad classes, which can lead to the misclassification of
unrelated algorithms and incorrect assessments of their qualities and behaviors
using the same metrics. To address these limitations, our paper introduces a
novel methodological taxonomy that classifies algorithms hierarchically into
fine-grained classes and specific techniques. The taxonomy includes methodology
categories, methodology techniques, and methodology sub-techniques. Our study
is the first survey to utilize this methodological taxonomy for classifying
algorithms for text classification. Furthermore, our study also conducts
empirical evaluation and experimental comparisons and rankings of different
algorithms that employ the same specific sub-technique, different
sub-techniques within the same technique, different techniques within the same
category, and categorie
- …