368 research outputs found

    Continuous Emotion Prediction from Speech: Modelling Ambiguity in Emotion

    Full text link
    There is growing interest in emotion research to model perceived emotion labelled as intensities along the affect dimensions such as arousal and valence. These labels are typically obtained from multiple annotators who would have their individualistic perceptions of emotional speech. Consequently, emotion prediction models that incorporate variation in individual perceptions as ambiguity in the emotional state would be more realistic. This thesis develops the modelling framework necessary to achieve continuous prediction of ambiguous emotional states from speech. Besides, emotion labels, feature space distribution and encoding are an integral part of the prediction system. The first part of this thesis examines the limitations of current low-level feature distributions and their minimalistic statistical descriptions. Specifically, front-end paralinguistic acoustic features are reflective of speech production mechanisms. However, discriminatively learnt features have frequently outperformed acoustic features in emotion prediction tasks, but provide no insights into the physical significance of these features. One of the contributions of this thesis is the development of a framework that can modify the acoustic feature representation based on emotion label information. Another investigation in this thesis indicates that emotion perception is language-dependent and in turn, helped develop a framework for cross-language emotion prediction. Furthermore, this investigation supported the hypothesis that emotion perception is highly individualistic and is better modelled as a distribution rather than a point estimate to encode information about the ambiguity in the perceived emotion. Following this observation, the thesis proposes measures to quantify the appropriateness of distribution types in modelling ambiguity in dimensional emotion labels which are then employed to compare well-known bounded parametric distributions. These analyses led to the conclusion that the beta distribution was the most appropriate parametric model of ambiguity in emotion labels. Finally, the thesis focuses on developing a deep learning framework for continuous emotion prediction as a temporal series of beta distributions, examining various parameterizations of the beta distributions as well as loss functions. Furthermore, distribution over the parameter spaces is examined and priors from kernel density estimation are employed to shape the posteriors over the parameter space which significantly improved valence ambiguity predictions. The proposed frameworks and methods have been extensively evaluated on multiple state of-the-art databases and the results demonstrate both the viability of predicting ambiguous emotion states and the validity of the proposed systems

    Estimating continuous affect with label uncertainty

    Get PDF
    Continuous affect estimation is a problem where there is an inherent uncertainty and subjectivity in the labels that accompany data samples -- typically, datasets use the average of multiple annotations or self-reporting to obtain ground truth labels. In this work, we propose a method for uncertainty-aware continuous affect estimation, that models explicitly the uncertainty of the ground truth label as a uni-variate Gaussian with mean equal to the ground truth label, and unknown variance. For each sample, the proposed neural network estimates not only the value of the target label (valence and arousal in our case), but also the variance. The network is trained with a loss that is defined as the KL-divergence between the estimation (valence/arousal) and the Gaussian around the ground truth. We show that, in two affect recognition problems with real data, the estimated variances are correlated with measures of uncertainty/error in the labels that are extracted by considering multiple annotations of the data

    USING DEEP LEARNING-BASED FRAMEWORK FOR CHILD SPEECH EMOTION RECOGNITION

    Get PDF
    Biological languages of the body through which human emotion can be detected abound including heart rate, facial expressions, movement of the eyelids and dilation of the eyes, body postures, skin conductance, and even the speech we make. Speech emotion recognition research started some three decades ago, and the popular Interspeech Emotion Challenge has helped to propagate this research area. However, most speech recognition research is focused on adults and there is very little research on child speech. This dissertation is a description of the development and evaluation of a child speech emotion recognition framework. The higher-level components of the framework are designed to sort and separate speech based on the speaker’s age, ensuring that focus is only on speeches made by children. The framework uses Baddeley’s Theory of Working Memory to model a Working Memory Recurrent Network that can process and recognize emotions from speech. Baddeley’s Theory of Working Memory offers one of the best explanations on how the human brain holds and manipulates temporary information which is very crucial in the development of neural networks that learns effectively. Experiments were designed and performed to provide answers to the research questions, evaluate the proposed framework, and benchmark the performance of the framework with other methods. Satisfactory results were obtained from the experiments and in many cases, our framework was able to outperform other popular approaches. This study has implications for various applications of child speech emotion recognition such as child abuse detection and child learning robots

    An Ordinal Approach to Affective Computing

    Full text link
    Both depression prediction and emotion recognition systems are often based on ordinal ground truth due to subjectively annotated datasets. Yet, both have so far been posed as classification or regression problems. These naive approaches have fundamental issues because they are not focused on ordering, unlike ordinal regression, which is the most appropriate for truly ordinal ground truth. Ordinal regression to date offers comparatively fewer, more limited methods when compared with other branches in machine learning, and its usage has been limited to specific research domains. Accordingly, this thesis presents investigations into ordinal approaches for affective computing by describing a consistent framework to understand all ordinal system designs, proposing ordinal systems for large datasets, and introducing tools and principles to select suitable system designs and evaluation methods. First, three learning approaches are compared using the support vector framework to establish the empirical advantages of ordinal regression, which is lacking from the current literature. Results on depression and emotion corpora indicate that ordinal regression with proper tuning can improve existing depression and emotion systems. Ordinal logistic regression (OLR), which is an extension of logistic regression for ordinal scales, contributes to a number of model structures, from which the best structure must be chosen. Exploiting the newly proposed computationally efficient greedy algorithm for model structure selection (GREP), OLR outperformed or was comparable with state-of-the-art depression systems on two benchmark depression speech datasets. Deep learning has dominated many affective computing fields, and hence ordinal deep learning is an attractive prospect. However, it is under-studied even in the machine learning literature, which motivates an in-depth analysis of appropriate network architectures and loss functions. One of the significant outcomes of this analysis is the introduction of RankCNet, a novel ordinal network which utilises a surrogate loss function of rank correlation. Not only the modelling algorithm but the choice of evaluation measure depends on the nature of the ground truth. Rank correlation measures, which are sensitive to ordering, are more apt for ordinal problems than common classification or regression measures that ignore ordering information. Although rank-based evaluation for ordinal problems is not new, so far in affective computing, ordinality of the ground truth has been widely ignored during evaluation. Hence, a systematic analysis in the affective computing context is presented, to provide clarity and encourage careful choice of evaluation measures. Another contribution is a neural network framework with a novel multi-term loss function to assess the ordinality of ordinally-annotated datasets, which can guide the selection of suitable learning and evaluation methods. Experiments on multiple synthetic and affective speech datasets reveal that the proposed system can offer reliable and meaningful predictions about the ordinality of a given dataset. Overall, the novel contributions and findings presented in this thesis not only improve prediction accuracy but also encourage future research towards ordinal affective computing: a different paradigm, but often the most appropriate

    Constrained Affective Computing

    Get PDF

    Behavior prediction in-the-wild

    Get PDF
    In this paper, the problem of audio-visual behavior prediction in-the-wild is addressed. In this context, both audio-visual descriptors of behavioral cues (features) and continuous-time real-valued characterizations of behavior (annotations) are (possibly) corrupted by non-Gaussian noise of large magnitude. The modeling assumption behind the proposed framework is that naturalistic affect and behavior captured in audio-visual episodes are smoothly-varying dynamic phenomena and thus the hidden temporal dynamics can be modeled as a generative auto-regressive process. Consequently, continuous-time real-valued characterizations of behavior (annotations) are postulated to be outputs of a low-complexity (i.e., low-order) time-invariant Linear Dynamical System (LDS) when descriptors of behavioral cues (features) act as inputs. To learn the parameters of the LDS, a recently proposed spectral method that relies on Hankel-rank minimization is adopted. Experimental evaluation on a challenging database recorded in the wild demonstrate the effectiveness of the proposed approach in behavior prediction
    • …
    corecore