5,537 research outputs found
Speech Emotion Recognition Using Multi-hop Attention Mechanism
In this paper, we are interested in exploiting textual and acoustic data of
an utterance for the speech emotion classification task. The baseline approach
models the information from audio and text independently using two deep neural
networks (DNNs). The outputs from both the DNNs are then fused for
classification. As opposed to using knowledge from both the modalities
separately, we propose a framework to exploit acoustic information in tandem
with lexical data. The proposed framework uses two bi-directional long
short-term memory (BLSTM) for obtaining hidden representations of the
utterance. Furthermore, we propose an attention mechanism, referred to as the
multi-hop, which is trained to automatically infer the correlation between the
modalities. The multi-hop attention first computes the relevant segments of the
textual data corresponding to the audio signal. The relevant textual data is
then applied to attend parts of the audio signal. To evaluate the performance
of the proposed system, experiments are performed in the IEMOCAP dataset.
Experimental results show that the proposed technique outperforms the
state-of-the-art system by 6.5% relative improvement in terms of weighted
accuracy.Comment: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral
presentation
Multimodal Speech Emotion Recognition Using Audio and Text
Speech emotion recognition is a challenging task, and extensive reliance has
been placed on models that use audio features in building well-performing
classifiers. In this paper, we propose a novel deep dual recurrent encoder
model that utilizes text data and audio signals simultaneously to obtain a
better understanding of speech data. As emotional dialogue is composed of sound
and spoken content, our model encodes the information from audio and text
sequences using dual recurrent neural networks (RNNs) and then combines the
information from these sources to predict the emotion class. This architecture
analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that
focus on audio features. Extensive experiments are conducted to investigate the
efficacy and properties of the proposed model. Our proposed model outperforms
previous state-of-the-art methods in assigning data to one of four emotion
categories (i.e., angry, happy, sad and neutral) when the model is applied to
the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201
Leksikaalsed emotsiooniteadmised: eesti keele emotsioonisõnavara struktuur, varieeruvus ja semantika
The emotion potential of words and passages in reading Harry Potter:an fMRI study
Previous studies suggested that the emotional connotation of single words automatically recruits attention. We investigated the potential of words to induce emotional engagement when reading texts. In an fMRI experiment, we presented 120 text passages from the Harry Potter book series. Results showed significant correlations between affective word (lexical) ratings and passage ratings. Furthermore, affective lexical ratings correlated with activity in regions associated with emotion, situation model building, multi-modal semantic integration, and Theory of Mind. We distinguished differential influences of affective lexical, inter-lexical, and supra-lexical variables: differential effects of lexical valence were significant in the left amygdala, while effects of arousal-span (the dynamic range of arousal across a passage) were significant in the left amygdala and insula. However, we found no differential effect of passage ratings in emotion-associated regions. Our results support the hypothesis that the emotion potential of short texts can be predicted by lexical and inter-lexical affective variables
What does semantic tiling of the cortex tell us about semantics?
Recent use of voxel-wise modeling in cognitive neuroscience suggests that semantic maps tile the cortex. Although this impressive research establishes distributed cortical areas active during the conceptual processing that underlies semantics, it tells us little about the nature of this processing. While mapping concepts between Marr's computational and implementation levels to support neural encoding and decoding, this approach ignores Marr's algorithmic level, central for understanding the mechanisms that implement cognition, in general, and conceptual processing, in particular. Following decades of research in cognitive science and neuroscience, what do we know so far about the representation and processing mechanisms that implement conceptual abilities? Most basically, much is known about the mechanisms associated with: (1) features and frame representations, (2) grounded, abstract, and linguistic representations, (3) knowledge-based inference, (4) concept composition, and (5) conceptual flexibility. Rather than explaining these fundamental representation and processing mechanisms, semantic tiles simply provide a trace of their activity over a relatively short time period within a specific learning context. Establishing the mechanisms that implement conceptual processing in the brain will require more than mapping it to cortical (and sub-cortical) activity, with process models from cognitive science likely to play central roles in specifying the intervening mechanisms. More generally, neuroscience will not achieve its basic goals until it establishes algorithmic-level mechanisms that contribute essential explanations to how the brain works, going beyond simply establishing the brain areas that respond to various task conditions
Investigating Emotion-label and Emotion-laden Words in a Semantic Satiation Paradigm
Current literature suggests emotion-label words (e.g., sad) and emotion-laden words (e.g., funeral) are processed differently. The central focus of the present study was to investigate how valence and emotion word type influence how words are processed. A satiation paradigm was used to characterize the relationship between the processing of emotion-label and emotion-laden words of positive and negative valence. It was hypothesized that, in addition to the standard slowed response times to satiated words, emotion-label words would exhibit greater satiation and priming effects than emotion-laden words. Analyses indicated expected priming and satiation effects across a range of other stimulus characteristics. Neutral words, which were included as a comparison stimulus type for both valence and word type variables, were shown to elicit much slower reaction times than either emotion word type. The results of the present study indicate the importance of valence in word processing, even when other word characteristics and experimental variables are at play. Current models of word processing do not sufficiently account for emotional characteristics of words, and implications for word processing models are discussed
The Glasgow Norms:Ratings of 5,500 words on nine scales
The Glasgow Norms are a set of normative ratings for 5,553 English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender association. The Glasgow Norms are unique in several respects. First, the corpus itself is relatively large, while simultaneously providing norms across a substantial number of lexical dimensions. Second, for any given subset of words, the same participants provided ratings across all nine dimensions (33 participants/word, on average). Third, two novel dimensions—semantic size and gender association—are included. Finally, the corpus contains a set of 379 ambiguous words that are presented either alone (e.g., toast) or with information that selects an alternative sense (e.g., toast (bread), toast (speech)). The relationships between the dimensions of the Glasgow Norms were initially investigated by assessing their correlations. In addition, a principal component analysis revealed four main factors, accounting for 82% of the variance (Visualization, Emotion, Salience, and Exposure). The validity of the Glasgow Norms was established via comparisons of our ratings to 18 different sets of current psycholinguistic norms. The dimension of size was tested with megastudy data, confirming findings from past studies that have explicitly examined this variable. Alternative senses of ambiguous words (i.e., disambiguated forms), when discordant on a given dimension, seemingly led to appropriately distinct ratings. Informal comparisons between the ratings of ambiguous words and of their alternative senses showed different patterns that likely depended on several factors (the number of senses, their relative strengths, and the rating scales themselves). Overall, the Glasgow Norms provide a valuable resource—in particular, for researchers investigating the role of word recognition in language comprehension
Combining quantitative narrative analysis and predictive modeling - an eye tracking study
As a part of a larger interdisciplinary project on Shakespeare sonnets’ reception (Jacobs et al., 2017; Xue et al., 2017), the present study analyzed the eye movement behavior of participants reading three of the 154 sonnets as a function of seven lexical features extracted via Quantitative Narrative Analysis (QNA). Using a machine learning- based predictive modeling approach five ‘surface’ features (word length, orthographic neighborhood density, word frequency, orthographic dissimilarity and sonority score) were detected as important predictors of total reading time and fixation probability in poetry reading. The fact that one phonological feature, i.e., sonority score, also played a role is in line with current theorizing on poetry reading. Our approach opens new ways for future eye movement research on reading poetic texts and other complex literary materials (cf. Jacobs, 2015c)
Emotional Speech Perception Unfolding in Time: The Role of the Basal Ganglia
The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages
- …