114 research outputs found

    Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

    Get PDF
    Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

    Emotion-aware cross-modal domain adaptation in video sequences

    Get PDF

    DeepTMH: Multimodal Semi-supervised framework leveraging Affective and Cognitive engagement for Telemental Health

    Full text link
    To aid existing telemental health services, we propose DeepTMH, a novel framework that models telemental health session videos by extracting latent vectors corresponding to Affective and Cognitive features frequently used in psychology literature. Our approach leverages advances in semi-supervised learning to tackle the data scarcity in the telemental health session video domain and consists of a multimodal semi-supervised GAN to detect important mental health indicators during telemental health sessions. We demonstrate the usefulness of our framework and contrast against existing works in two tasks: Engagement regression and Valence-Arousal regression, both of which are important to psychologists during a telemental health session. Our framework reports 40% improvement in RMSE over SOTA method in Engagement Regression and 50% improvement in RMSE over SOTA method in Valence-Arousal Regression. To tackle the scarcity of publicly available datasets in telemental health space, we release a new dataset, MEDICA, for mental health patient engagement detection. Our dataset, MEDICA consists of 1299 videos, each 3 seconds long. To the best of our knowledge, our approach is the first method to model telemental health session data based on psychology-driven Affective and Cognitive features, which also accounts for data sparsity by leveraging a semi-supervised setup

    Data Augmentation for Deep-Learning-Based Electroencephalography

    Get PDF
    Background: Data augmentation (DA) has recently been demonstrated to achieve considerable performance gains for deep learning (DL)—increased accuracy and stability and reduced overfitting. Some electroencephalography (EEG) tasks suffer from low samples-to-features ratio, severely reducing DL effectiveness. DA with DL thus holds transformative promise for EEG processing, possibly like DL revolutionized computer vision, etc. New method: We review trends and approaches to DA for DL in EEG to address: Which DA approaches exist and are common for which EEG tasks? What input features are used? And, what kind of accuracy gain can be expected? Results: DA for DL on EEG begun 5 years ago and is steadily used more. We grouped DA techniques (noise addition, generative adversarial networks, sliding windows, sampling, Fourier transform, recombination of segmentation, and others) and EEG tasks (into seizure detection, sleep stages, motor imagery, mental workload, emotion recognition, motor tasks, and visual tasks). DA efficacy across techniques varied considerably. Noise addition and sliding windows provided the highest accuracy boost; mental workload most benefitted from DA. Sliding window, noise addition, and sampling methods most common for seizure detection, mental workload, and sleep stages, respectively. Comparing with existing methods: Percent of decoding accuracy explained by DA beyond unaugmented accuracy varied between 8% for recombination of segmentation and 36% for noise addition and from 14% for motor imagery to 56% for mental workload—29% on average. Conclusions: DA increasingly used and considerably improved DL decoding accuracy on EEG. Additional publications—if adhering to our reporting guidelines—will facilitate more detailed analysis

    Data Augmentation for Deep-Learning-Based Electroencephalography

    Get PDF
    Background: Data augmentation (DA) has recently been demonstrated to achieve considerable performance gains for deep learning (DL)—increased accuracy and stability and reduced overfitting. Some electroencephalography (EEG) tasks suffer from low samples-to-features ratio, severely reducing DL effectiveness. DA with DL thus holds transformative promise for EEG processing, possibly like DL revolutionized computer vision, etc. New method: We review trends and approaches to DA for DL in EEG to address: Which DA approaches exist and are common for which EEG tasks? What input features are used? And, what kind of accuracy gain can be expected? Results: DA for DL on EEG begun 5 years ago and is steadily used more. We grouped DA techniques (noise addition, generative adversarial networks, sliding windows, sampling, Fourier transform, recombination of segmentation, and others) and EEG tasks (into seizure detection, sleep stages, motor imagery, mental workload, emotion recognition, motor tasks, and visual tasks). DA efficacy across techniques varied considerably. Noise addition and sliding windows provided the highest accuracy boost; mental workload most benefitted from DA. Sliding window, noise addition, and sampling methods most common for seizure detection, mental workload, and sleep stages, respectively. Comparing with existing methods: Percent of decoding accuracy explained by DA beyond unaugmented accuracy varied between 8% for recombination of segmentation and 36% for noise addition and from 14% for motor imagery to 56% for mental workload—29% on average. Conclusions: DA increasingly used and considerably improved DL decoding accuracy on EEG. Additional publications—if adhering to our reporting guidelines—will facilitate more detailed analysis

    Human-controllable and structured deep generative models

    Get PDF
    Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.Open Acces
    • …
    corecore