4 research outputs found

    Speech emotion recognition using 2D-convolutional neural network

    Get PDF
    This research proposes a speech emotion recognition model to predict human emotions using the convolutional neural network (CNN) by learning segmented audio of specific emotions. Speech emotion recognition utilizes the extracted features of audio waves to learn speech emotion characteristics; one of them is mel frequency cepstral coefficient (MFCC). Dataset takes a vital role to obtain valuable results in model learning. Hence this research provides the leverage of dataset combination implementation. The model learns a combined dataset with audio segmentation and zero padding using 2D-CNN. Audio segmentation and zero padding equalize the extracted audio features to learn the characteristics. The model results in 83.69% accuracy to predict seven emotions: neutral, happy, sad, angry, fear, disgust, and surprise from the combined dataset with the segmentation of the audio files

    Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations

    Full text link
    [EN] The recognition of emotions in tone voice is currently a tool with a high potential when it comes to making recommendations, since it allows to personalize recommendations using the mood of the users as information. However, recognizing emotions using tone of voice is a complex task since it is necessary to pre-process the signal and subsequently recognize the emotion. Most of the current proposals use recurrent networks based on sequences with a temporal relationship. The disadvantage of these networks is that they have a high runtime, which makes it difficult to use in real-time applications. On the other hand, when defining this type of classifier, culture and language must be taken into account, since the tone of voice for the same emotion can vary depending on these cultural factors. In this work we propose a culturally adapted model for recognizing emotions from the voice tone using convolutional neural networks. This type of network has a relatively short execution time allowing its use in real time applications. The results we have obtained improve the current state of the art, reaching 93.6% success over the validation set.This work is partially supported by the Spanish Government project TIN2017-89156-R, GVA-CEICE project PROMETEO/2018/002, Generalitat Valenciana and European Social Fund FPI grant ACIF/2017/085, Universitat Politecnica de Valencia research grant (PAID-10-19), and by the Spanish Government (RTI2018-095390-B-C31).Fuentes-López, JM.; Taverner-Aparicio, JJ.; Rincón Arango, JA.; Botti Navarro, VJ. (2020). Towards a Classifier to Recognize Emotions Using Voice to Improve Recommendations. Springer. 218-225. https://doi.org/10.1007/978-3-030-51999-5_18S218225Balakrishnan, A., Rege, A.: Reading emotions from speech using deep neural networks. Technical report, Stanford University, Computer Science Department (2017)Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M.: Speech emotion recognition: methods and cases study, pp. 175–182 (2018)McCluskey, K.W., Albas, D.C., Niemi, R.R., Cuevas, C., Ferrer, C.: Cross-cultural differences in the perception of the emotional content of speech: a study of the development of sensitivity in Canadian and Mexican children. Dev. Psychol. 11(5), 551 (1975)Paliwal, K.K.: Spectral subband centroid features for speech recognition. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 1998 (Cat. No. 98CH36181), vol. 2, pp. 617–620. IEEE (1998)Paulmann, S., Uskul, A.K.: Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners. Cogn. Emot. 28(2), 230–244 (2014)Pépiot, E.: Voice, speech and gender: male-female acoustic differences and cross-language variation in English and French speakers. Corela Cogn. Représent. Lang. (HS-16) (2015)Picard, R.W., et al.: Affective computing. Perceptual Computing Section, Media Laboratory, Massachusetts Institute of Technology (1995)Rincon, J., de la Prieta, F., Zanardini, D., Julian, V., Carrascosa, C.: Influencing over people with a social emotional model. Neurocomputing 231, 47–54 (2017)Russell, J.A., Lewicka, M., Niit, T.: A cross-cultural study of a circumplex model of affect. J. Pers. Soc. Psychol. 57(5), 848 (1989)Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition, vol. 2, pp. 401–404 (2003)Schuller, B., Villar, R., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition, vol. 1, pp. 325–328 (2005)Thompson, W., Balkwill, L.-L.: Decoding speech prosody in five languages. Semiotica 2006, 407–424 (2006)Tyagi, V., Wellekens, C.: On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 2005, vol. 1, pp. I–529. IEEE (2005)Ueda, M., Morishita, Y., Nakamura, T., Takata, N., Nakajima, S.: A recipe recommendation system that considers user’s mood. In: Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services, pp. 472–476. ACM (2016)Zhang, B., Quan, C., Ren, F.: Study on CNN in the recognition of emotion in audio and images. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–5, June 201
    corecore