10 research outputs found
Capturing emotions in voice: A comparative analysis of methodologies in psychology and digital signal processing
People use their voices to communicate not only verbally but also emotionally. This article presents theories and methodologies that concern emotional vocalizations at the intersection of psychology and digital signal processing. Specifically, it demonstrates the encoding (production) and decoding (recognition) of emotional sounds, including the review and comparison of strategies in database design, parameterization, and classification. Whereas psychology predominantly focuses on the subjective recognition of emotional vocalizations, digital signal processing relies on automated and thus more objective vocal affect measures. The article aims to compare these two approaches and suggest methods of combining them to achieve a more complex insight into the vocal communication of emotions
Speech Emotion Recognition Based on Voice Fundamental Frequency
The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments
Reading rate in filmic audio description
The study discussed in this article was carried out as a pilot study to assess the
process, resources and data management scheme (Thabane et al., 2010) to be
used in a large-scale experiment on filmic audio description (AD) reading rate.
As part of this study we defined the reading rate in filmic AD context. We described
the characteristic features of Polish filmic AD scripts and recordings and
examined the reading rate of Polish AD for three Polish fiction films: a comedy, a drama, and an action film. We calculated the average length of breath pauses and
the maximum, minimum and average reading rate measured in characters per
second (CPS) and words per minute (WPM) – two measures commonly used in
audiovisual translation. The main finding of this study is the validation of the research
procedure for testing the AD reading rate. We also computed the average
reading rate for Polish filmic AD (179 WPM) and discovered that it changes depending
on the film genre (167 WP for drama, 182 for comedy and 189 for action).
When it comes to breath pauses in Polish AD, we calculated their average length
at 190 ms – a value much lower than expected for breath pauses in Polish. The
results of our study are discussed in the context of research on the speech tempo
Speech Analysis as a Tool for Detection and Monitoring of Medical Conditions : A review
The goal of this article is to present and compare recent approaches which use speech and voice analysis as biomarkers for screening tests and monitoring of some diseases. The article takes into account metabolic, respiratory, cardiovascular, endocrine, and nervous system disorders. A selection of articles was performed to identify studies that assess voice features quantitatively in selected disorders by acoustic and linguistic voice analysis. Information was extracted from each paper in order to compare various aspects of datasets, speech parameters, methods of applied analysis and obtained results. 110 research papers were reviewed and 47 databases were summarized. Speech analysis is a promising method for early diagnosis of certain disorders. Advanced computer voice analysis with machine learning algorithms combined with the widespread availability of smartphones allows diagnostic analysis to be conducted during the patient’s visit to the doctor or at the patient’s home during a telephone conversation. Speech analysis is a simple, low-cost, non-invasive and easy-toprovide method of medical diagnosis. These are remarkable advantages, but there are also disadvantages. The effectiveness of disease diagnoses varies from 65% up to 99%. For that reason it should be treated as a medical screening test and should be an indication of the need for classic medical tests
How Behavioral, Photographic, and Interactional Realism Influence the Sense of Co-Presence in VR. An Investigation with Psychophysiological Measurement
Feeling of co-presence in VR depends on the realism of virtual agents. Our study explores how three dimensions of realism—visual appearance, behavior, and interactability—affect co-presence and Orienting Response (OR), measured using heart rate (HR) and skin conductance response (SCR). Moreover, we test whether HR and SCR can be used as measures of psychological concepts that describe virtual interactions like co-presence. Fourty-five participants passively viewed virtual characters while their HR and SCR were recorded. Afterwards participants assessed the experience of interacting with the virtual agents. The interactability of the virtual characters increased co-presence, and so did heightened appearance realism, but only when the level of behavioral realism was high. High visual and behavioral realism led to increase in SCR while visual realism alone evoked deeper HR deceleration. Nonetheless, neither SCR nor HR correlated with any psychological concepts that describe virtual interactions. In conclusion, realism can increase both the co-presence and magnitude of the OR, yet physiological indices can not reliably gauge the experience of interactions with virtual characters.</p
Recommended from our members
Towards Multimodal VR Trainer of Voice Emission and Public Speaking: Work-in-Progress
GlossoVR is a virtual reality (VR) application that combines training in public speaking in front of a virtual audience and in voice emission in relaxation exercises. It is accompanied by digital signal processing (DSP) and artificial intelligence (AI) modules which provide automatic feedback on the vocal performance as well as the behavior and psychophysiology of the user. In particular, we address parameters of speech emotions, prosody and timbre, and the user's hand gestures and eye movement. The prototype is in the proof of concept phase, and we are developing it in accordance with the user-centered design paradigm. In this article reports the work in progress, focusing on the approaches, datasets and algorithms applied in the current state of the glossoVR project.This work was supported by program Lider grant no 0230/L-11/2019 by the National Center for Research and Development, Poland