Search CORE

6,212 research outputs found

Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms

Author: Alam Fatema
Arafat Most. Yeasmin
Hosain Mehrab
Hossain Md. Mobarak
Islam Gazi Zahirul
Uddin Jia
Publication venue
Publication date: 20/04/2023
Field of study

There are a variety of features of the human voice that can be classified as pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents that human expresses their feelings using different vocal qualities when they are speaking. The primary objective of this research is to recognize different emotions of human beings such as anger, sadness, fear, neutrality, disgust, pleasant surprise, and happiness by using several MATLAB functions namely, spectral descriptors, periodicity, and harmonicity. To accomplish the work, we analyze the CREMA-D (Crowd-sourced Emotional Multimodal Actors Data) & TESS (Toronto Emotional Speech Set) datasets of human speech. The audio file contains data that have various characteristics (e.g., noisy, speedy, slow) thereby the efficiency of the ML (Machine Learning) models increases significantly. The EMD (Empirical Mode Decomposition) is utilized for the process of signal decomposition. Then, the features are extracted through the use of several techniques such as the MFCC, GTCC, spectral centroid, roll-off point, entropy, spread, flux, harmonic ratio, energy, skewness, flatness, and audio delta. The data is trained using some renowned ML models namely, Support Vector Machine, Neural Network, Ensemble, and KNN. The algorithms show an accuracy of 67.7%, 63.3%, 61.6%, and 59.0% respectively for the test data and 77.7%, 76.1%, 99.1%, and 61.2% for the training data. We have conducted experiments using Matlab and the result shows that our model is very prominent and flexible than existing similar works.Comment: Journal Pre-print (15 Pages, 9 Figures, 3 Tables

arXiv.org e-Print Archive

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

The effects of projected films on singers' expressivity in choral performance

Author: Keown Daniel J.
Publication venue
Publication date: 01/01/2013
Field of study

Title from PDF of title page, viewed on July 23, 2013Dissertation advisor: Charles RobinsonVitaIncludes bibliographic references (pages 224-259)Thesis (Ph.D.)--Conservatory of Music and Dance and School of Education. University of Missouri--Kansas City, 2013The purpose of this study was to investigate the effects of projected film visuals on singers' expressivity in choral performance. The study was divided into three phases. In Phase One, university choir singers (N = 21) viewed eight audiovisual pairings (two film excerpts and four choral etudes) and rated these pairings according to perceived music to film congruency. Based on these ratings, two choral etudes were identified that elicited the broadest congruency contrasts when paired with the film segments. In Phase Two, a different group of university choir singers (N = 116) rehearsed and prepared both of the selected choral etudes referred to as “Doh” and “Noo.” Subsequently, these singers were organized into smaller chamber ensembles (n = 11), and performed each choral etude three times under the following conditions: (1) while viewing congruent film, (2) while viewing incongruent film, and (3) with no film projected. After each performance, singers reported their level of self-expression. At the completion of all three performances, singers reported their preferred performance condition. Finally, participants listened to their audio-recorded performances and rated these for performance expressivity and personal preference. During Phase Three, choral experts (N = 8) rated performance expressivity and reported personal preference for each audio-recorded performance. A two-way ANOVA with repeated measures found significant main effects of both etude and film visual performance condition on participants' expressivity ratings (p < .001). Additionally, a significant etude x film visual performance condition interaction was discovered (p = .001). Participants rated self-expression significantly higher when singing with a congruent film compared with other conditions for both etudes (p < .001). Chi-square tests found most preferred experiences during congruent performances, and least preferred experiences during incongruent performances for both etudes (p < .001). Expressivity ratings for audio-recorded performances indicated significantly higher expressivity ratings for the performances influenced by the congruent film visual of etude “Doh” (p < .05), while no significant differences were found for etude “Noo” (p > .05). Implications of these findings are discussed in relation to filmmaking techniques, music education curriculum, choral rehearsal pedagogy, and composition/performance practice, with recommendations for future research.Introduction -- Review of literature -- Methodology -- Results -- Discussion -- Appendix A. Phase one - Recruitment script -- Appendix B. Film segments one and two - snapshot images -- Appendix C. Four choral etudes -- Appendix D. Phase one - script -- Appendix E. Phase one - consent form -- Appendix F. Phase one - Survey tool -- Appendix G. Phase two - singers recruitment script -- Appendix H. Rehearsal lesson plan -- Appendix I. Room and material dimensions -- Appendix J. Phase two - singer consent form -- Appendix K. Phase two - script -- Appendix L. Phase two - self-report survey tool -- Appendix M. Phase two - listening perception survey tool -- Appendix N. Phase three -- choral expert recruitment script -- Appendix O. Phase three - Choral expert consent form -- Appendix P. Phase three - script -- Appendix Q. Phase three - listening perception survey too

University of Missouri: MOspace

ProQuest OAI Repository

The Affective Potential of Formal Play: Camp Sensibility and Dark Humor in AIDS Activist Video

Author: Lotan Mia
Publication venue: Bard Digital Commons
Publication date: 01/01/2018
Field of study

This project explores the activist employment of camp sensibility and dark humor in alternative AIDS video from the late 1980\u27s and early 1990\u27s, looking specifically at the provocation of complex affect as a result of such techniques

Bard College

2023-2024 Course Catalog

Author: Columbia College Chicago
Publication venue: Digital Commons @ Columbia College Chicago
Publication date: 01/01/2023
Field of study

2023-2024 Course Catalo

Columbia College Chicago: Digital Commons

2022-2023 Course Catalog

Author: Columbia College Chicago
Publication venue: Digital Commons @ Columbia College Chicago
Publication date: 01/01/2022
Field of study

2022-2023 Course Catalo

Columbia College Chicago: Digital Commons

2021-2022 Course Catalog

Author: Columbia College Chicago
Publication venue: Digital Commons @ Columbia College Chicago
Publication date: 01/01/2021
Field of study

2021-2022 Course Catalo

Columbia College Chicago: Digital Commons

2020-2021 Course Catalog

Author: Columbia College Chicago
Publication venue: Digital Commons @ Columbia College Chicago
Publication date: 01/01/2020
Field of study

2020-2021 Course Catalo

Columbia College Chicago: Digital Commons

A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots

Author: Sutcliffe Richard
Publication venue
Publication date: 31/12/2023
Field of study

We present a review of personality in neural conversational agents (CAs), also called chatbots. First, we define Personality, Persona, and Profile. We explain all personality schemes which have been used in CAs, and list models under the scheme(s) which they use. Second we describe 21 datasets which have been developed in recent CA personality research. Third, we define the methods used to embody personality in a CA, and review recent models using them. Fourth, we survey some relevant reviews on CAs, personality, and related topics. Finally, we draw conclusions and identify some research challenges for this important emerging field.Comment: 25 pages, 6 tables, 207 reference

arXiv.org e-Print Archive