6,212 research outputs found
Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms
There are a variety of features of the human voice that can be classified as
pitch, timbre, loudness, and vocal tone. It is observed in numerous incidents
that human expresses their feelings using different vocal qualities when they
are speaking. The primary objective of this research is to recognize different
emotions of human beings such as anger, sadness, fear, neutrality, disgust,
pleasant surprise, and happiness by using several MATLAB functions namely,
spectral descriptors, periodicity, and harmonicity. To accomplish the work, we
analyze the CREMA-D (Crowd-sourced Emotional Multimodal Actors Data) & TESS
(Toronto Emotional Speech Set) datasets of human speech. The audio file
contains data that have various characteristics (e.g., noisy, speedy, slow)
thereby the efficiency of the ML (Machine Learning) models increases
significantly. The EMD (Empirical Mode Decomposition) is utilized for the
process of signal decomposition. Then, the features are extracted through the
use of several techniques such as the MFCC, GTCC, spectral centroid, roll-off
point, entropy, spread, flux, harmonic ratio, energy, skewness, flatness, and
audio delta. The data is trained using some renowned ML models namely, Support
Vector Machine, Neural Network, Ensemble, and KNN. The algorithms show an
accuracy of 67.7%, 63.3%, 61.6%, and 59.0% respectively for the test data and
77.7%, 76.1%, 99.1%, and 61.2% for the training data. We have conducted
experiments using Matlab and the result shows that our model is very prominent
and flexible than existing similar works.Comment: Journal Pre-print (15 Pages, 9 Figures, 3 Tables
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
The effects of projected films on singers' expressivity in choral performance
Title from PDF of title page, viewed on July 23, 2013Dissertation advisor: Charles RobinsonVitaIncludes bibliographic references (pages 224-259)Thesis (Ph.D.)--Conservatory of Music and Dance and School of Education. University of Missouri--Kansas City, 2013The purpose of this study was to investigate the effects of projected film visuals on
singers' expressivity in choral performance. The study was divided into three phases. In
Phase One, university choir singers (N = 21) viewed eight audiovisual pairings (two film
excerpts and four choral etudes) and rated these pairings according to perceived music to film
congruency. Based on these ratings, two choral etudes were identified that elicited the
broadest congruency contrasts when paired with the film segments. In Phase Two, a different group of university choir singers (N = 116) rehearsed and
prepared both of the selected choral etudes referred to as “Doh” and “Noo.” Subsequently,
these singers were organized into smaller chamber ensembles (n = 11), and performed each
choral etude three times under the following conditions: (1) while viewing congruent film,
(2) while viewing incongruent film, and (3) with no film projected. After each performance,
singers reported their level of self-expression. At the completion of all three performances,
singers reported their preferred performance condition. Finally, participants listened to their
audio-recorded performances and rated these for performance expressivity and personal
preference. During Phase Three, choral experts (N = 8) rated performance expressivity and
reported personal preference for each audio-recorded performance. A two-way ANOVA with repeated measures found significant main effects of both
etude and film visual performance condition on participants' expressivity ratings (p < .001).
Additionally, a significant etude x film visual performance condition interaction was
discovered (p = .001). Participants rated self-expression significantly higher when singing
with a congruent film compared with other conditions for both etudes (p < .001). Chi-square
tests found most preferred experiences during congruent performances, and least preferred
experiences during incongruent performances for both etudes (p < .001). Expressivity ratings
for audio-recorded performances indicated significantly higher expressivity ratings for the
performances influenced by the congruent film visual of etude “Doh” (p < .05), while no
significant differences were found for etude “Noo” (p > .05). Implications of these findings
are discussed in relation to filmmaking techniques, music education curriculum, choral
rehearsal pedagogy, and composition/performance practice, with recommendations for future
research.Introduction -- Review of literature -- Methodology -- Results -- Discussion -- Appendix A. Phase one - Recruitment script -- Appendix B. Film segments one and two - snapshot images -- Appendix C. Four choral etudes -- Appendix D. Phase one - script -- Appendix E. Phase one - consent form -- Appendix F. Phase one - Survey tool -- Appendix G. Phase two - singers recruitment script -- Appendix H. Rehearsal lesson plan -- Appendix I. Room and material dimensions -- Appendix J. Phase two - singer consent form -- Appendix K. Phase two - script -- Appendix L. Phase two - self-report survey tool -- Appendix M. Phase two - listening perception survey tool -- Appendix N. Phase three -- choral expert recruitment script -- Appendix O. Phase three - Choral expert consent form -- Appendix P. Phase three - script -- Appendix Q. Phase three - listening perception survey too
The Affective Potential of Formal Play: Camp Sensibility and Dark Humor in AIDS Activist Video
This project explores the activist employment of camp sensibility and dark humor in alternative AIDS video from the late 1980\u27s and early 1990\u27s, looking specifically at the provocation of complex affect as a result of such techniques
A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots
We present a review of personality in neural conversational agents (CAs),
also called chatbots. First, we define Personality, Persona, and Profile. We
explain all personality schemes which have been used in CAs, and list models
under the scheme(s) which they use. Second we describe 21 datasets which have
been developed in recent CA personality research. Third, we define the methods
used to embody personality in a CA, and review recent models using them.
Fourth, we survey some relevant reviews on CAs, personality, and related
topics. Finally, we draw conclusions and identify some research challenges for
this important emerging field.Comment: 25 pages, 6 tables, 207 reference
- …