10 research outputs found

    Big Data with Emotions in Context to Pakistan

    Get PDF
    As apiece throughout true-life communication, people certainly used motions, the tendencies of speech, transcripts to symbolize emotions and to prompt their sentimentalities. And by means of the topical development of community sites like Snapchat, Google+, LinkedIn etc. that provides abundant foundations of individual’s ideas, are the views examination procedures. Proposed artifact, our ambition to define a ceremonial artistry of sentimentality examination organization that perceives sentimentalities about dumpy improper messages such as SMS, Google+ etc., and sentimentality about arguments either an expression inside memos and audio-conceptual emotions acknowledgement. This survey delivers structures and openings that lead us to empower views mining information pursuing methods. Our resolution is to use those means that pursue ideas from persons and the contests upraised by the sentimentality cognizant solicitations

    Using EEG-validated Music Emotion Recognition Techniques to Classify Multi-Genre Popular Music for Therapeutic Purposes

    Get PDF
    Music is observed to possess significant beneficial effects to human mental health, especially for patients undergoing therapy and older adults. Prior research focusing on machine recognition of the emotion music induces by classifying low-level music features has utilized subjective annotation to label data for classification. We validate this approach by using an electroencephalography-based approach to cross-check the predictions of music emotion made with the predictions from low-level music feature data as well as collected subjective annotation data. Collecting 8-channel EEG data from 10 participants listening to segments of 40 songs from 5 different genres, we obtain a subject-independent classification accuracy for EEG test data of 98.2298% using an ensemble classifier. We also classify low-level music features to cross-check music emotion predictions from music features with the predictions from EEG data, obtaining a classification accuracy of 94.9774% using an ensemble classifier. We establish links between specific genre preference and perceived valence, validating individualized approaches towards music therapy. We then use the classification predictions from the EEG data and combine it with the predictions from music feature data and subjective annotations, showing the similarity of the predictions made by these approaches, validating an integrated approach with music features and subjective annotation to classify music emotion. We use the music feature-based approach to classify 250 popular songs from 5 genres and create a musical playlist application to create playlists based on existing psychological theory to contribute emotional benefit to individuals, validating our playlist methodology as an effective method to induce positive emotional response

    Evaluating raw waveforms with deep learning frameworks for speech emotion recognition

    Full text link
    Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.Comment: 14 pages, 6 Figures, 8 Table

    Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals

    Get PDF
    Recently, researchers have paid escalating attention to studying the emotional state of an individual from his/her speech signals as the speech signal is the fastest and the most natural method of communication between individuals. In this work, new feature enhancement using Gaussian mixture model (GMM) was proposed to enhance the discriminatory power of the features extracted from speech and glottal signals. Three different emotional speech databases were utilized to gauge the proposed methods. Extreme learning machine (ELM) and k-nearest neighbor (kNN) classifier were employed to classify the different types of emotions. Several experiments were conducted and results show that the proposed methods significantly improved the speech emotion recognition performance compared to research works published in the literature

    A framework for emotion and sentiment predicting supported in ensembles

    Get PDF
    Humans are prepared to comprehend each other’s emotions through subtle body movements or facial expressions; using those expressions, individuals change how they deliver messages when communicating between them. Machines, user interfaces, or robots need to empower this ability, in a way to change the interaction from the traditional “human-computer interaction” to a “human-machine cooperation”, where the machine provides the “right” information and functionality, at the “right” time, and in the “right” way. This dissertation presents a framework for emotion classification based on facial, speech, and text emotion prediction sources, supported by an ensemble of open-source code retrieved from off-the-shelf available methods. The main contribution is integrating outputs from different sources and methods in a single prediction, consistent with the emotions presented by the system’s user. For each different source, an initial aggregation of primary classifiers was implemented: for facial emotion classification, the aggregation achieved an accuracy above 73% in both FER2013 and RAF-DB datasets; For the speech emotion classification, four datasets were used, namely: RAVDESS, TESS, CREMA-D, and SAVEE. The aggregation of primary classifiers, achieved for a combination of three of the mentioned datasets results above 86 % of accuracy; The text emotion aggregation of primary classifiers was tested with one dataset called EMOTIONLINES, the classification of emotions achieved an accuracy above 53 %. Finally, the integration of all the methods in a single framework allows us to develop an emotion multi-source aggregator (EMsA), which aggregates the results extracted from the primary emotion classifications from different sources, such as facial, speech, text etc. We describe the EMsA and results using the RAVDESS dataset, which achieved 81.99% accuracy, in the case of the EMsA using a combination of faces and speech. Finally, we present an initial approach for sentiment classification.Os humanos estão preparados para compreender as emoções uns dos outros por meio de movimentos subtis do corpo ou expressões faciais; i.e., a forma como esses movimentos e expressões são enviados mudam a forma de como são entregues as mensagens quando os humanos comunicam entre eles. Máquinas, interfaces de utilizador ou robôs precisam de potencializar essa capacidade, de forma a mudar a interação do tradicional “interação humano-computador” para uma “cooperação homem-máquina”, onde a máquina fornece as informações e funcionalidades “certas”, na hora “certa” e da maneira “certa”. Nesta dissertação é apresentada uma estrutura (um ensemble de modelos) para classificação de emoções baseada em múltiplas fontes, nomeadamente na previsão de emoções faciais, de fala e de texto. Os classificadores base são suportados em código-fonte aberto associados a métodos disponíveis na literatura (classificadores primários). A principal contribuição é integrar diferentes fontes e diferentes métodos (os classificadores primários) numa única previsão consistente com as emoções apresentadas pelo utilizador do sistema. Neste contexto, salienta-se que da análise ao estado da arte efetuada sobre as diferentes formas de classificar emoções em humanos, existe o reconhecimento de emoção corporal (não considerando a face). No entanto, não foi encontrado código-fonte aberto e publicado para os classificadores primários que possam ser utilizados no âmbito desta dissertação. No reconhecimento de emoções da fala e texto foram também encontradas algumas dificuldades em encontrar classificadores primários com os requisitos necessários, principalmente no texto, pois existem bastantes modelos, mas com inúmeras emoções diferentes das 6 emoções básicas consideradas (tristeza, medo, surpresa, repulsa, raiva e alegria). Para o texto ainda possível verificar que existem mais modelos com a previsão de sentimento do que de emoções. De forma isolada para cada uma das fontes, i.e., para cada componente analisada (face, fala e texto), foi desenvolvido uma framework em Python que implementa um agregador primário com n classificadores primários (nesta dissertação considerou-se n igual 3). Para executar os testes e obter os resultados de cada agregador primário é usado um dataset específico e é enviado a informação do dataset para o agregador. I.e., no caso do agregador facial é enviado uma imagem, no caso do agregador da fala é enviado um áudio e no caso do texto é enviado a frase para a correspondente framework. Cada dataset usado foi dividido em ficheiros treino, validação e teste. Quando a framework acaba de processar a informação recebida são gerados os respetivos resultados, nomeadamente: nome do ficheiro/identificação do input, resultados do primeiro classificador primário, resultados do segundo classificador primário, resultados do terceiro classificador primário e ground-truth do dataset. Os resultados dos classificadores primários são depois enviados para o classificador final desse agregador primário, onde foram testados quatro classificadores: (a) voting, que, no caso de n igual 3, consiste na comparação dos resultados da emoção de cada classificador primário, i.e., se 2 classificadores primários tiverem a mesma emoção o resultado do voting será esse, se todos os classificadores tiverem resultados diferentes nenhum resultado é escolhido. Além deste “classificador” foram ainda usados (b) Random Forest, (c) Adaboost e (d) MLP (multiplayer perceptron). Quando a framework de cada agregador primário foi concluída, foi desenvolvido um super-agregador que tem o mesmo princípio dos agregadores primários, mas, agora, em vez de ter os resultados/agregação de apenas 3 classificadores primários, vão existir n × 3 resultados de classificadores primários (n da face, n da fala e n do texto). Relativamente aos resultados dos agregadores usados para cada uma das fontes, face, fala e texto, obteve-se para a classificação de emoção facial uma precisão de classificação acima de 73% nos datasets FER2013 e RAF-DB. Na classificação da emoção da fala foram utilizados quatro datasets, nomeadamente RAVDESS, TESS, CREMA-D e SAVEE, tendo que o melhor resultado de precisão obtido foi acima dos 86% quando usado a combinação de 3 dos 4 datasets. Para a classificação da emoção do texto, testou-se com o um dataset EMOTIONLINES, sendo o melhor resultado obtido foi de 53% (precisão). A integração de todas os classificadores primários agora num único framework permitiu desenvolver o agregador multi-fonte (emotion multi-source aggregator - EMsA), onde a classificação final da emoção é extraída, como já referido da agregação dos classificadores de emoções primárias de diferentes fontes. Para EMsA são apresentados resultados usando o dataset RAVDESS, onde foi alcançado uma precisão de 81.99 %, no caso do EMsA usar uma combinação de faces e fala. Não foi possível testar EMsA usando um dataset reconhecido na literatura que tenha ao mesmo tempo informação do texto, face e fala. Por último, foi apresentada uma abordagem inicial para classificação de sentimentos

    Audio-visual feature selection and reduction for emotion classification

    No full text
    Recognition of expressed emotion from speech and facial gestures was investigated in experiments on an audio-visual emotional database. A total of 106 audio and 240 visual features were extracted and then features were selected with Plus l-Take Away r algorithm based on Bhattacharyya distance criterion. In the second step, linear transformation methods, principal component analysis (PCA) and linear discriminant analysis (LDA), were applied to the selected features and Gaussian classifiers were used for classification of emotions. The performance was higher for LDA features compared to PCA features. The visual features performed better than audio features, for both PCA and LDA. Across a range of fusion schemes, the audio-visual feature results were close to that of visual features. A highest recognition rate of 53% was achieved with audio features, 98% with visual features, and 98% with audio-visual features selected by Bhattacharyya distance and transformed by LDA

    Audio-visual feature selection and reduction for emotion classification

    Get PDF
    Recognition of expressed emotion from speech and facial gestures was investigated in experiments on an audio-visual emotional database. A total of 106 audio and 240 visual features were extracted and then features were selected with Plus l-Take Away r algorithm based on Bhattacharyya distance criterion. In the second step, linear transformation methods, principal component analysis (PCA) and linear discriminant analysis (LDA), were applied to the selected features and Gaussian classifiers were used for classification of emotions. The performance was higher for LDA features compared to PCA features. The visual features performed better than audio features, for both PCA and LDA. Across a range of fusion schemes, the audio-visual feature results were close to that of visual features. A highest recognition rate of 53 % was achieved with audio features, 98 % with visual features, and 98 % with audio-visual features selected by Bhattacharyya distance and transformed by LDA. 1 Index Terms: emotion recognition, multimodal feature selection, principal component analysi
    corecore