420 research outputs found

    Attention-enhanced connectionist temporal classification for discrete speech emotion recognition

    Get PDF
    Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. This approach, however, is limited, in that it can result in models that do not capture temporal changes in the speech signal, including those indicative of a particular emotion. One potential solution to overcome this limitation is to model SER as a sequence-to-sequence task instead. In this regard, we have developed an attention-based bidirectional long short-term memory (BLSTM) neural network in combination with a connectionist temporal classification (CTC) objective function (Attention-BLSTM-CTC) for SER. We also assessed the benefits of incorporating two contemporary attention mechanisms, namely component attention and quantum attention, into the CTC framework. To the best of the authors’ knowledge, this is the first time that such a hybrid architecture has been employed for SER.We demonstrated the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and FAU-Aibo Emotion corpora. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.The work presented in this paper substantially supported by the National Natural Science Foundation of China (Grant No. 61702370), the Key Program of the Natural Science Foundation of Tianjin (Grant No. 18JCZDJC36300), the Open Projects Program of the National Laboratory of Pattern Recognition, and the Senior Visiting Scholar Program of Tianjin Normal University. Interspeech 2019 ISSN: 1990-977

    Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

    Get PDF
    Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms

    Precommitted Investment Strategy versus Time-Consistent Investment Strategy for a Dual Risk Model

    Get PDF
    We are concerned with optimal investment strategy for a dual risk model. We assume that the company can invest into a risk-free asset and a risky asset. Short-selling and borrowing money are allowed. Due to lack of iterated-expectation property, the Bellman Optimization Principle does not hold. Thus we investigate the precommitted strategy and time-consistent strategy, respectively. We take three steps to derive the precommitted investment strategy. Furthermore, the time-consistent investment strategy is also obtained by solving the extended Hamilton-Jacobi-Bellman equations. We compare the precommitted strategy with time-consistent strategy and find that these different strategies have different advantages: the former can make value function maximized at the original time t=0 and the latter strategy is time-consistent for the whole time horizon. Finally, numerical analysis is presented for our results

    Hierarchical attention transfer networks for depression assessment from speech

    Get PDF
    corecore