4,701 research outputs found
ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis – particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository – the Orchive – comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species
The Impact of Emotion Focused Features on SVM and MLR Models for Depression Detection
Major depressive disorder (MDD) is a common mental health diagnosis with estimates upwards of 25% of the United States population remain undiagnosed. Psychomotor symptoms of MDD impacts speed of control of the vocal tract, glottal source features and the rhythm of speech. Speech enables people to perceive the emotion of the speaker and MDD decreases the mood magnitudes expressed by an individual. This study asks the questions: “if high level features deigned to combine acoustic features related to emotion detection are added to glottal source features and mean response time in support vector machines and multivariate logistic regression models, would that improve the recall of the MDD class?” To answer this question, a literature review goes through common features in MDD detection, especially features related to emotion recognition. Using feature transformation, emotion recognition composite features are produced and added to glottal source features for model evaluation
USING DEEP LEARNING-BASED FRAMEWORK FOR CHILD SPEECH EMOTION RECOGNITION
Biological languages of the body through which human emotion can be detected abound including heart rate, facial expressions, movement of the eyelids and dilation of the eyes, body postures, skin conductance, and even the speech we make. Speech emotion recognition research started some three decades ago, and the popular Interspeech Emotion Challenge has helped to propagate this research area. However, most speech recognition research is focused on adults and there is very little research on child speech. This dissertation is a description of the development and evaluation of a child speech emotion recognition framework. The higher-level components of the framework are designed to sort and separate speech based on the speaker’s age, ensuring that focus is only on speeches made by children. The framework uses Baddeley’s Theory of Working Memory to model a Working Memory Recurrent Network that can process and recognize emotions from speech. Baddeley’s Theory of Working Memory offers one of the best explanations on how the human brain holds and manipulates temporary information which is very crucial in the development of neural networks that learns effectively. Experiments were designed and performed to provide answers to the research questions, evaluate the proposed framework, and benchmark the performance of the framework with other methods. Satisfactory results were obtained from the experiments and in many cases, our framework was able to outperform other popular approaches. This study has implications for various applications of child speech emotion recognition such as child abuse detection and child learning robots
Multimodal Emotion Recognition among Couples from Lab Settings to Daily Life using Smartwatches
Couples generally manage chronic diseases together and the management takes
an emotional toll on both patients and their romantic partners. Consequently,
recognizing the emotions of each partner in daily life could provide an insight
into their emotional well-being in chronic disease management. The emotions of
partners are currently inferred in the lab and daily life using self-reports
which are not practical for continuous emotion assessment or observer reports
which are manual, time-intensive, and costly. Currently, there exists no
comprehensive overview of works on emotion recognition among couples.
Furthermore, approaches for emotion recognition among couples have (1) focused
on English-speaking couples in the U.S., (2) used data collected from the lab,
and (3) performed recognition using observer ratings rather than partner's
self-reported / subjective emotions. In this body of work contained in this
thesis (8 papers - 5 published and 3 currently under review in various
journals), we fill the current literature gap on couples' emotion recognition,
develop emotion recognition systems using 161 hours of data from a total of
1,051 individuals, and make contributions towards taking couples' emotion
recognition from the lab which is the status quo, to daily life. This thesis
contributes toward building automated emotion recognition systems that would
eventually enable partners to monitor their emotions in daily life and enable
the delivery of interventions to improve their emotional well-being.Comment: PhD Thesis, 2022 - ETH Zuric
Big Data analytics to assess personality based on voice analysis
Trabajo Fin de Grado en IngenierĂa de TecnologĂas y Servicios de
TelecomunicaciĂłnWhen humans speak, the produced series of acoustic signs do not encode only the
linguistic message they wish to communicate, but also several other types of information
about themselves and their states that show glimpses of their personalities and can be
apprehended by judgers. As there is nowadays a trend to film job candidate’s interviews, the
aim of this Thesis is to explore possible correlations between speech features extracted from
interviews and personality characteristics established by experts, and to try to predict in a
candidate the Big Five personality traits: Conscientiousness, Agreeableness, Neuroticism,
Openness to Experience and Extraversion. The features were extracted from a genuine
database of 44 women video recordings acquired in 2020, and 78 in 2019 and before from a
previous study.
Even though many significant correlations were found for each years’ dataset, lots of
them were proven to be inconsistent through both studies. Only extraversion, and openness
in a more limited way, showed a good number of clear correlations. Essentially, extraversion
has been found to be related to the variation in the slope of the pitch (usually at the end of
sentences), which indicates that a more "singing" voice could be associated with a higher
score. In addition, spectral entropy and roll-off measurements have also been found to
indicate that larger changes in the spectrum (which may also be related to more "singing"
voices) could be associated with greater extraversion too.
Regarding predictive modelling algorithms, aimed to estimate personality traits from the
speech features obtained for the study, results were observed to be very limited in terms of
accuracy and RMSE, and also through scatter plots for regression models and confusion
matrixes for classification evaluation. Nevertheless, various results encourage to believe that
there are some predicting capabilities, and extraversion and openness also ended up being
the most predictable personality traits. Better outcomes were achieved when predictions
were performed based on one specific feature instead of all of them or a reduced group, as it
was the case for openness when estimated through linear and logistic regression based on
time over 90% of the variation range of the deltas from the entropy of the spectrum module.
Extraversion too, as it correlates well with features relating variation in F0 decreasing slope
and variations in the spectrum. For the predictions, several machine learning algorithms have
been used, such as linear regression, logistic regression and random forests
- …