5,537 research outputs found

    Speech Emotion Recognition Using Multi-hop Attention Mechanism

    Full text link
    In this paper, we are interested in exploiting textual and acoustic data of an utterance for the speech emotion classification task. The baseline approach models the information from audio and text independently using two deep neural networks (DNNs). The outputs from both the DNNs are then fused for classification. As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data. The proposed framework uses two bi-directional long short-term memory (BLSTM) for obtaining hidden representations of the utterance. Furthermore, we propose an attention mechanism, referred to as the multi-hop, which is trained to automatically infer the correlation between the modalities. The multi-hop attention first computes the relevant segments of the textual data corresponding to the audio signal. The relevant textual data is then applied to attend parts of the audio signal. To evaluate the performance of the proposed system, experiments are performed in the IEMOCAP dataset. Experimental results show that the proposed technique outperforms the state-of-the-art system by 6.5% relative improvement in terms of weighted accuracy.Comment: 5 pages, Accepted as a conference paper at ICASSP 2019 (oral presentation

    Multimodal Speech Emotion Recognition Using Audio and Text

    Full text link
    Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201

    The emotion potential of words and passages in reading Harry Potter:an fMRI study

    Get PDF
    Previous studies suggested that the emotional connotation of single words automatically recruits attention. We investigated the potential of words to induce emotional engagement when reading texts. In an fMRI experiment, we presented 120 text passages from the Harry Potter book series. Results showed significant correlations between affective word (lexical) ratings and passage ratings. Furthermore, affective lexical ratings correlated with activity in regions associated with emotion, situation model building, multi-modal semantic integration, and Theory of Mind. We distinguished differential influences of affective lexical, inter-lexical, and supra-lexical variables: differential effects of lexical valence were significant in the left amygdala, while effects of arousal-span (the dynamic range of arousal across a passage) were significant in the left amygdala and insula. However, we found no differential effect of passage ratings in emotion-associated regions. Our results support the hypothesis that the emotion potential of short texts can be predicted by lexical and inter-lexical affective variables

    What does semantic tiling of the cortex tell us about semantics?

    Get PDF
    Recent use of voxel-wise modeling in cognitive neuroscience suggests that semantic maps tile the cortex. Although this impressive research establishes distributed cortical areas active during the conceptual processing that underlies semantics, it tells us little about the nature of this processing. While mapping concepts between Marr's computational and implementation levels to support neural encoding and decoding, this approach ignores Marr's algorithmic level, central for understanding the mechanisms that implement cognition, in general, and conceptual processing, in particular. Following decades of research in cognitive science and neuroscience, what do we know so far about the representation and processing mechanisms that implement conceptual abilities? Most basically, much is known about the mechanisms associated with: (1) features and frame representations, (2) grounded, abstract, and linguistic representations, (3) knowledge-based inference, (4) concept composition, and (5) conceptual flexibility. Rather than explaining these fundamental representation and processing mechanisms, semantic tiles simply provide a trace of their activity over a relatively short time period within a specific learning context. Establishing the mechanisms that implement conceptual processing in the brain will require more than mapping it to cortical (and sub-cortical) activity, with process models from cognitive science likely to play central roles in specifying the intervening mechanisms. More generally, neuroscience will not achieve its basic goals until it establishes algorithmic-level mechanisms that contribute essential explanations to how the brain works, going beyond simply establishing the brain areas that respond to various task conditions

    Investigating Emotion-label and Emotion-laden Words in a Semantic Satiation Paradigm

    Get PDF
    Current literature suggests emotion-label words (e.g., sad) and emotion-laden words (e.g., funeral) are processed differently.  The central focus of the present study was to investigate how valence and emotion word type influence how words are processed. A satiation paradigm was used to characterize the relationship between the processing of emotion-label and emotion-laden words of positive and negative valence. It was hypothesized that, in addition to the standard slowed response times to satiated words, emotion-label words would exhibit greater satiation and priming effects than emotion-laden words.  Analyses indicated expected priming and satiation effects across a range of other stimulus characteristics. Neutral words, which were included as a comparison stimulus type for both valence and word type variables, were shown to elicit much slower reaction times than either emotion word type. The results of the present study indicate the importance of valence in word processing, even when other word characteristics and experimental variables are at play. Current models of word processing do not sufficiently account for emotional characteristics of words, and implications for word processing models are discussed

    The Glasgow Norms:Ratings of 5,500 words on nine scales

    Get PDF
    The Glasgow Norms are a set of normative ratings for 5,553 English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender association. The Glasgow Norms are unique in several respects. First, the corpus itself is relatively large, while simultaneously providing norms across a substantial number of lexical dimensions. Second, for any given subset of words, the same participants provided ratings across all nine dimensions (33 participants/word, on average). Third, two novel dimensions—semantic size and gender association—are included. Finally, the corpus contains a set of 379 ambiguous words that are presented either alone (e.g., toast) or with information that selects an alternative sense (e.g., toast (bread), toast (speech)). The relationships between the dimensions of the Glasgow Norms were initially investigated by assessing their correlations. In addition, a principal component analysis revealed four main factors, accounting for 82% of the variance (Visualization, Emotion, Salience, and Exposure). The validity of the Glasgow Norms was established via comparisons of our ratings to 18 different sets of current psycholinguistic norms. The dimension of size was tested with megastudy data, confirming findings from past studies that have explicitly examined this variable. Alternative senses of ambiguous words (i.e., disambiguated forms), when discordant on a given dimension, seemingly led to appropriately distinct ratings. Informal comparisons between the ratings of ambiguous words and of their alternative senses showed different patterns that likely depended on several factors (the number of senses, their relative strengths, and the rating scales themselves). Overall, the Glasgow Norms provide a valuable resource—in particular, for researchers investigating the role of word recognition in language comprehension

    Combining quantitative narrative analysis and predictive modeling - an eye tracking study

    Get PDF
    As a part of a larger interdisciplinary project on Shakespeare sonnets’ reception (Jacobs et al., 2017; Xue et al., 2017), the present study analyzed the eye movement behavior of participants reading three of the 154 sonnets as a function of seven lexical features extracted via Quantitative Narrative Analysis (QNA). Using a machine learning- based predictive modeling approach five ‘surface’ features (word length, orthographic neighborhood density, word frequency, orthographic dissimilarity and sonority score) were detected as important predictors of total reading time and fixation probability in poetry reading. The fact that one phonological feature, i.e., sonority score, also played a role is in line with current theorizing on poetry reading. Our approach opens new ways for future eye movement research on reading poetic texts and other complex literary materials (cf. Jacobs, 2015c)

    Emotional Speech Perception Unfolding in Time: The Role of the Basal Ganglia

    Get PDF
    The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages
    corecore