6,212 research outputs found

    Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms

    Full text link
    There are a variety of features of the human voice that can be classified as pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents that human expresses their feelings using different vocal qualities when they are speaking. The primary objective of this research is to recognize different emotions of human beings such as anger, sadness, fear, neutrality, disgust, pleasant surprise, and happiness by using several MATLAB functions namely, spectral descriptors, periodicity, and harmonicity. To accomplish the work, we analyze the CREMA-D (Crowd-sourced Emotional Multimodal Actors Data) & TESS (Toronto Emotional Speech Set) datasets of human speech. The audio file contains data that have various characteristics (e.g., noisy, speedy, slow) thereby the efficiency of the ML (Machine Learning) models increases significantly. The EMD (Empirical Mode Decomposition) is utilized for the process of signal decomposition. Then, the features are extracted through the use of several techniques such as the MFCC, GTCC, spectral centroid, roll-off point, entropy, spread, flux, harmonic ratio, energy, skewness, flatness, and audio delta. The data is trained using some renowned ML models namely, Support Vector Machine, Neural Network, Ensemble, and KNN. The algorithms show an accuracy of 67.7%, 63.3%, 61.6%, and 59.0% respectively for the test data and 77.7%, 76.1%, 99.1%, and 61.2% for the training data. We have conducted experiments using Matlab and the result shows that our model is very prominent and flexible than existing similar works.Comment: Journal Pre-print (15 Pages, 9 Figures, 3 Tables

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    The effects of projected films on singers' expressivity in choral performance

    Get PDF
    Title from PDF of title page, viewed on July 23, 2013Dissertation advisor: Charles RobinsonVitaIncludes bibliographic references (pages 224-259)Thesis (Ph.D.)--Conservatory of Music and Dance and School of Education. University of Missouri--Kansas City, 2013The purpose of this study was to investigate the effects of projected film visuals on singers' expressivity in choral performance. The study was divided into three phases. In Phase One, university choir singers (N = 21) viewed eight audiovisual pairings (two film excerpts and four choral etudes) and rated these pairings according to perceived music to film congruency. Based on these ratings, two choral etudes were identified that elicited the broadest congruency contrasts when paired with the film segments. In Phase Two, a different group of university choir singers (N = 116) rehearsed and prepared both of the selected choral etudes referred to as “Doh” and “Noo.” Subsequently, these singers were organized into smaller chamber ensembles (n = 11), and performed each choral etude three times under the following conditions: (1) while viewing congruent film, (2) while viewing incongruent film, and (3) with no film projected. After each performance, singers reported their level of self-expression. At the completion of all three performances, singers reported their preferred performance condition. Finally, participants listened to their audio-recorded performances and rated these for performance expressivity and personal preference. During Phase Three, choral experts (N = 8) rated performance expressivity and reported personal preference for each audio-recorded performance. A two-way ANOVA with repeated measures found significant main effects of both etude and film visual performance condition on participants' expressivity ratings (p < .001). Additionally, a significant etude x film visual performance condition interaction was discovered (p = .001). Participants rated self-expression significantly higher when singing with a congruent film compared with other conditions for both etudes (p < .001). Chi-square tests found most preferred experiences during congruent performances, and least preferred experiences during incongruent performances for both etudes (p < .001). Expressivity ratings for audio-recorded performances indicated significantly higher expressivity ratings for the performances influenced by the congruent film visual of etude “Doh” (p < .05), while no significant differences were found for etude “Noo” (p > .05). Implications of these findings are discussed in relation to filmmaking techniques, music education curriculum, choral rehearsal pedagogy, and composition/performance practice, with recommendations for future research.Introduction -- Review of literature -- Methodology -- Results -- Discussion -- Appendix A. Phase one - Recruitment script -- Appendix B. Film segments one and two - snapshot images -- Appendix C. Four choral etudes -- Appendix D. Phase one - script -- Appendix E. Phase one - consent form -- Appendix F. Phase one - Survey tool -- Appendix G. Phase two - singers recruitment script -- Appendix H. Rehearsal lesson plan -- Appendix I. Room and material dimensions -- Appendix J. Phase two - singer consent form -- Appendix K. Phase two - script -- Appendix L. Phase two - self-report survey tool -- Appendix M. Phase two - listening perception survey tool -- Appendix N. Phase three -- choral expert recruitment script -- Appendix O. Phase three - Choral expert consent form -- Appendix P. Phase three - script -- Appendix Q. Phase three - listening perception survey too

    The Affective Potential of Formal Play: Camp Sensibility and Dark Humor in AIDS Activist Video

    Get PDF
    This project explores the activist employment of camp sensibility and dark humor in alternative AIDS video from the late 1980\u27s and early 1990\u27s, looking specifically at the provocation of complex affect as a result of such techniques

    2023-2024 Course Catalog

    Get PDF
    2023-2024 Course Catalo

    2022-2023 Course Catalog

    Get PDF
    2022-2023 Course Catalo

    2021-2022 Course Catalog

    Get PDF
    2021-2022 Course Catalo

    2020-2021 Course Catalog

    Get PDF
    2020-2021 Course Catalo

    A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots

    Full text link
    We present a review of personality in neural conversational agents (CAs), also called chatbots. First, we define Personality, Persona, and Profile. We explain all personality schemes which have been used in CAs, and list models under the scheme(s) which they use. Second we describe 21 datasets which have been developed in recent CA personality research. Third, we define the methods used to embody personality in a CA, and review recent models using them. Fourth, we survey some relevant reviews on CAs, personality, and related topics. Finally, we draw conclusions and identify some research challenges for this important emerging field.Comment: 25 pages, 6 tables, 207 reference
    corecore