9 research outputs found

    Continuous Analysis of Affect from Voice and Face

    Get PDF
    Human affective behavior is multimodal, continuous and complex. Despite major advances within the affective computing research field, modeling, analyzing, interpreting and responding to human affective behavior still remains a challenge for automated systems as affect and emotions are complex constructs, with fuzzy boundaries and with substantial individual differences in expression and experience [7]. Therefore, affective and behavioral computing researchers have recently invested increased effort in exploring how to best model, analyze and interpret the subtlety, complexity and continuity (represented along a continuum e.g., from −1 to +1) of affective behavior in terms of latent dimensions (e.g., arousal, power and valence) and appraisals, rather than in terms of a small number of discrete emotion categories (e.g., happiness and sadness). This chapter aims to (i) give a brief overview of the existing efforts and the major accomplishments in modeling and analysis of emotional expressions in dimensional and continuous space while focusing on open issues and new challenges in the field, and (ii) introduce a representative approach for multimodal continuous analysis of affect from voice and face, and provide experimental results using the audiovisual Sensitive Artificial Listener (SAL) Database of natural interactions. The chapter concludes by posing a number of questions that highlight the significant issues in the field, and by extracting potential answers to these questions from the relevant literature. The chapter is organized as follows. Section 10.2 describes theories of emotion, Sect. 10.3 provides details on the affect dimensions employed in the literature as well as how emotions are perceived from visual, audio and physiological modalities. Section 10.4 summarizes how current technology has been developed, in terms of data acquisition and annotation, and automatic analysis of affect in continuous space by bringing forth a number of issues that need to be taken into account when applying a dimensional approach to emotion recognition, namely, determining the duration of emotions for automatic analysis, modeling the intensity of emotions, determining the baseline, dealing with high inter-subject expression variation, defining optimal strategies for fusion of multiple cues and modalities, and identifying appropriate machine learning techniques and evaluation measures. Section 10.5 presents our representative system that fuses vocal and facial expression cues for dimensional and continuous prediction of emotions in valence and arousal space by employing the bidirectional Long Short-Term Memory neural networks (BLSTM-NN), and introduces an output-associative fusion framework that incorporates correlations between the emotion dimensions to further improve continuous affect prediction. Section 10.6 concludes the chapter

    Leveraging contextual-cognitive relationships into mobile commerce systems

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophyMobile smart devices are becoming increasingly important within the on-line purchasing cycle. Thus the requirement for mobile commerce systems to become truly context-aware remains paramount if they are to be effective within the varied situations that mobile users encounter. Where traditionally a recommender system will focus upon the user – item relationship, i.e. what to recommend, in this thesis it is proposed that due to the complexity of mobile user situational profiles the how and when must also be considered for recommendations to be effective. Though non-trivial, it should be, through the understanding of a user’s ability to complete certain cognitive processes, possible to determine the likelihood of engagement and therefore the success of the recommendation. This research undertakes an investigation into physical and modal contexts and presents findings as to their relationships with cognitive processes. Through the introduction of the novel concept, disruptive contexts, situational contexts, including noise, distractions and user activity, are identified as having significant effects upon the relationship between user affective state and cognitive capability. Experimental results demonstrate that by understanding specific cognitive capabilities, e.g. a user’s perception of advert content and user levels of purchase-decision involvement, a system can determine potential user engagement and therefore improve the effectiveness of recommender systems’ performance. A quantitative approach is followed with a reliance upon statistical measures to inform the development, and subsequent validation, of a contextual-cognitive model that was implemented as part of a context-aware system. The development of SiDISense (Situational Decision Involvement Sensing system) demonstrated, through the use of smart-phone sensors and machine learning, that is was viable to classify subjectively rated contexts to then infer levels of cognitive capability and therefore likelihood of positive user engagement. Through this success in furthering the understanding of contextual-cognitive relationships there are novel and significant advances that are now viable within the area of m-commerce

    A Human-Centric Approach to Data Fusion in Post-Disaster Managment: The Development of a Fuzzy Set Theory Based Model

    Get PDF
    It is critical to provide an efficient and accurate information system in the post-disaster phase for individuals\u27 in order to access and obtain the necessary resources in a timely manner; but current map based post-disaster management systems provide all emergency resource lists without filtering them which usually leads to high levels of energy consumed in calculation. Also an effective post-disaster management system (PDMS) will result in distribution of all emergency resources such as, hospital, storage and transportation much more reasonably and be more beneficial to the individuals in the post disaster period. In this Dissertation, firstly, semi-supervised learning (SSL) based graph systems was constructed for PDMS. A Graph-based PDMS\u27 resource map was converted to a directed graph that presented by adjacent matrix and then the decision information will be conducted from the PDMS by two ways, one is clustering operation, and another is graph-based semi-supervised optimization process. In this study, PDMS was applied for emergency resource distribution in post-disaster (responses phase), a path optimization algorithm based ant colony optimization (ACO) was used for minimizing the cost in post-disaster, simulation results show the effectiveness of the proposed methodology. This analysis was done by comparing it with clustering based algorithms under improvement ACO of tour improvement algorithm (TIA) and Min-Max Ant System (MMAS) and the results also show that the SSL based graph will be more effective for calculating the optimization path in PDMS. This research improved the map by combining the disaster map with the initial GIS based map which located the target area considering the influence of disaster. First, all initial map and disaster map will be under Gaussian transformation while we acquired the histogram of all map pictures. And then all pictures will be under discrete wavelet transform (DWT), a Gaussian fusion algorithm was applied in the DWT pictures. Second, inverse DWT (iDWT) was applied to generate a new map for a post-disaster management system. Finally, simulation works were proposed and the results showed the effectiveness of the proposed method by comparing it to other fusion algorithms, such as mean-mean fusion and max-UD fusion through the evaluation indices including entropy, spatial frequency (SF) and image quality index (IQI). Fuzzy set model were proposed to improve the presentation capacity of nodes in this GIS based PDMS

    Machine learning for automatic analysis of affective behaviour

    Get PDF
    The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces

    PAD-based multimodal affective fusion

    No full text

    PAD-based multimodal affective fusion

    No full text

    PAD-based Multimodal Affective Fusion

    No full text
    The study of multimodality is comparatively less developed for Affective interfaces than for their traditional counterparts. However, one condition for the successful development of Affective interface technologies is the development of frameworks for the real-time multimodal fusion. In this paper, we describe an approach to multimodal affective fusion, which relies on a dimensional model, Pleasure-Arousal-Dominance (PAD) to support the fusion of affective modalities, each input modality being represented as a PAD vector. We describe how this model supports both affective content fusion and temporal fusion within a unified approach. We report results from early user studies which confirm the existence of a correlation between measured affective input and user temperament scores. 1
    corecore