3 research outputs found

    Affective Speech Recognition

    Get PDF
    Speech, as a medium of interaction, carries two different streams of information. Whereas one stream carries explicit messages, the other one contains implicit information about speakers themselves. Affective speech recognition is a set of theories and tools that intend to automate unfolding the part of the implicit stream that has to do with humans emotion. Application of affective speech recognition is to human computer interaction; a machine that is able to recognize humans emotion could engage the user in a more effective interaction. This thesis proposes a set of analyses and methodologies that advance automatic recognition of affect from speech. The proposed solution spans two dimensions of the problem: speech signal processing, and statistical learning. At the speech signal processing dimension, extraction of speech low-level descriptors is dis- cussed, and a set of descriptors that exploit the spectrum of the signal are proposed, which have shown to be particularly practical for capturing affective qualities of speech. Moreover, consider- ing the non-stationary property of the speech signal, further proposed is a measure of dynamicity that captures that property of speech by quantifying changes of the signal over time. Furthermore, based on the proposed set of low-level descriptors, it is shown that individual human beings are different in conveying emotions, and that parts of the spectrum that hold the affective information are different from one person to another. Therefore, the concept of emotion profile is proposed that formalizes those differences by taking into account different factors such as cultural and gender-specific differences, as well as those distinctions that have to do with individual human beings. At the statistical learning dimension, variable selection is performed to identify speech features that are most imperative to extracting affective information. In doing so, low-level descriptors are distinguished from statistical functionals, therefore, effectiveness of each of the two are studied dependently and independently. The major importance of variable selection as a standalone component of a solution is to real-time application of affective speech recognition. Although thousands of speech features are commonly used to tackle this problem in theory, extracting that many features in a real-time manner is unrealistic, especially for mobile applications. Results of the conducted investigations show that the required number of speech features is far less than the number that is commonly used in the literature of the problem. At the core of an affective speech recognition solution is a statistical model that uses speech features to recognize emotions. Such a model comes with a set of parameters that are estimated through a learning process. Proposed in this thesis is a learning algorithm, developed based on the notion of Hilbert-Schmidt independence criterion and named max-dependence regression, that maximizes the dependence between predicted and actual values of affective qualities. Pearson鈥檚 correlation coefficient is commonly used as the measure of goodness of a fit in the literature of affective computing, therefore max-dependence regression is proposed to make the learning and hypothesis testing criteria consistent with one another. Results of this research show that doing so results in higher prediction accuracy. Lastly, sparse representation for affective speech datasets is considered in this thesis. For this purpose, the application of a dictionary learning algorithm based on Hilbert-Schmidt independence criterion is proposed. Dictionary learning is used to identify the most important bases of the data in order to improve the generalization capability of the proposed solution to affective speech recognition. Based on the dictionary learning approach of choice, fusion of feature vectors is proposed. It is shown that sparse representation leads to higher generalization capability for affective speech recognition

    An Ordinal Approach to Affective Computing

    Full text link
    Both depression prediction and emotion recognition systems are often based on ordinal ground truth due to subjectively annotated datasets. Yet, both have so far been posed as classification or regression problems. These naive approaches have fundamental issues because they are not focused on ordering, unlike ordinal regression, which is the most appropriate for truly ordinal ground truth. Ordinal regression to date offers comparatively fewer, more limited methods when compared with other branches in machine learning, and its usage has been limited to specific research domains. Accordingly, this thesis presents investigations into ordinal approaches for affective computing by describing a consistent framework to understand all ordinal system designs, proposing ordinal systems for large datasets, and introducing tools and principles to select suitable system designs and evaluation methods. First, three learning approaches are compared using the support vector framework to establish the empirical advantages of ordinal regression, which is lacking from the current literature. Results on depression and emotion corpora indicate that ordinal regression with proper tuning can improve existing depression and emotion systems. Ordinal logistic regression (OLR), which is an extension of logistic regression for ordinal scales, contributes to a number of model structures, from which the best structure must be chosen. Exploiting the newly proposed computationally efficient greedy algorithm for model structure selection (GREP), OLR outperformed or was comparable with state-of-the-art depression systems on two benchmark depression speech datasets. Deep learning has dominated many affective computing fields, and hence ordinal deep learning is an attractive prospect. However, it is under-studied even in the machine learning literature, which motivates an in-depth analysis of appropriate network architectures and loss functions. One of the significant outcomes of this analysis is the introduction of RankCNet, a novel ordinal network which utilises a surrogate loss function of rank correlation. Not only the modelling algorithm but the choice of evaluation measure depends on the nature of the ground truth. Rank correlation measures, which are sensitive to ordering, are more apt for ordinal problems than common classification or regression measures that ignore ordering information. Although rank-based evaluation for ordinal problems is not new, so far in affective computing, ordinality of the ground truth has been widely ignored during evaluation. Hence, a systematic analysis in the affective computing context is presented, to provide clarity and encourage careful choice of evaluation measures. Another contribution is a neural network framework with a novel multi-term loss function to assess the ordinality of ordinally-annotated datasets, which can guide the selection of suitable learning and evaluation methods. Experiments on multiple synthetic and affective speech datasets reveal that the proposed system can offer reliable and meaningful predictions about the ordinality of a given dataset. Overall, the novel contributions and findings presented in this thesis not only improve prediction accuracy but also encourage future research towards ordinal affective computing: a different paradigm, but often the most appropriate
    corecore