48 research outputs found

    CorrFeat: Correlation-based feature extraction algorithm using skin conductance and pupil diameter for emotion recognition

    Get PDF
    To recognize emotions using less obtrusive wearable sensors, we present a novel emotion recognition method that uses only pupil diameter (PD) and skin conductance (SC). Psychological studies show that these two signals are related to the attention level of humans exposed to visual stimuli. Based on this, we propose a feature extraction algorithm that extract correlation-based features for participants watching the same video clip. To boost performance given limited data, we implement a learning system without a deep architecture to classify arousal and valence. Our method outperforms not only state-of-art approaches, but also widely-used traditional and deep learning methods

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy

    Modern Views of Machine Learning for Precision Psychiatry

    Full text link
    In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research

    Brain Music : Sistema generativo para la creación de música simbólica a partir de respuestas neuronales afectivas

    Get PDF
    gráficas, tablasEsta tesis de maestría presenta una metodología de aprendizaje profundo multimodal innovadora que fusiona un modelo de clasificación de emociones con un generador musical, con el propósito de crear música a partir de señales de electroencefalografía, profundizando así en la interconexión entre emociones y música. Los resultados alcanzan tres objetivos específicos: Primero, ya que el rendimiento de los sistemas interfaz cerebro-computadora varía considerablemente entre diferentes sujetos, se introduce un enfoque basado en la transferencia de conocimiento entre sujetos para mejorar el rendimiento de individuos con dificultades en sistemas de interfaz cerebro-computadora basados en el paradigma de imaginación motora. Este enfoque combina datos de EEG etiquetados con datos estructurados, como cuestionarios psicológicos, mediante un método de "Kernel Matching CKA". Utilizamos una red neuronal profunda (Deep&Wide) para la clasificación de la imaginación motora. Los resultados destacan su potencial para mejorar las habilidades motoras en interfaces cerebro-computadora. Segundo, proponemos una técnica innovadora llamada "Labeled Correlation Alignment"(LCA) para sonificar respuestas neurales a estímulos representados en datos no estructurados, como música afectiva. Esto genera características musicales basadas en la actividad cerebral inducida por las emociones. LCA aborda la variabilidad entre sujetos y dentro de sujetos mediante el análisis de correlación, lo que permite la creación de envolventes acústicos y la distinción entre diferente información sonora. Esto convierte a LCA en una herramienta prometedora para interpretar la actividad neuronal y su reacción a estímulos auditivos. Finalmente, en otro capítulo, desarrollamos una metodología de aprendizaje profundo de extremo a extremo para generar contenido musical MIDI (datos simbólicos) a partir de señales de actividad cerebral inducidas por música con etiquetas afectivas. Esta metodología abarca el preprocesamiento de datos, el entrenamiento de modelos de extracción de características y un proceso de emparejamiento de características mediante Deep Centered Kernel Alignment, lo que permite la generación de música a partir de señales EEG. En conjunto, estos logros representan avances significativos en la comprensión de la relación entre emociones y música, así como en la aplicación de la inteligencia artificial en la generación musical a partir de señales cerebrales. Ofrecen nuevas perspectivas y herramientas para la creación musical y la investigación en neurociencia emocional. Para llevar a cabo nuestros experimentos, utilizamos bases de datos públicas como GigaScience, Affective Music Listening y Deap Dataset (Texto tomado de la fuente)This master’s thesis presents an innovative multimodal deep learning methodology that combines an emotion classification model with a music generator, aimed at creating music from electroencephalography (EEG) signals, thus delving into the interplay between emotions and music. The results achieve three specific objectives: First, since the performance of brain-computer interface systems varies significantly among different subjects, an approach based on knowledge transfer among subjects is introduced to enhance the performance of individuals facing challenges in motor imagery-based brain-computer interface systems. This approach combines labeled EEG data with structured information, such as psychological questionnaires, through a "Kernel Matching CKA"method. We employ a deep neural network (Deep&Wide) for motor imagery classification. The results underscore its potential to enhance motor skills in brain-computer interfaces. Second, we propose an innovative technique called "Labeled Correlation Alignment"(LCA) to sonify neural responses to stimuli represented in unstructured data, such as affective music. This generates musical features based on emotion-induced brain activity. LCA addresses variability among subjects and within subjects through correlation analysis, enabling the creation of acoustic envelopes and the distinction of different sound information. This makes LCA a promising tool for interpreting neural activity and its response to auditory stimuli. Finally, in another chapter, we develop an end-to-end deep learning methodology for generating MIDI music content (symbolic data) from EEG signals induced by affectively labeled music. This methodology encompasses data preprocessing, feature extraction model training, and a feature matching process using Deep Centered Kernel Alignment, enabling music generation from EEG signals. Together, these achievements represent significant advances in understanding the relationship between emotions and music, as well as in the application of artificial intelligence in musical generation from brain signals. They offer new perspectives and tools for musical creation and research in emotional neuroscience. To conduct our experiments, we utilized public databases such as GigaScience, Affective Music Listening and Deap DatasetMaestríaMagíster en Ingeniería - Automatización IndustrialInvestigación en Aprendizaje Profundo y señales BiológicasEléctrica, Electrónica, Automatización Y Telecomunicaciones.Sede Manizale

    Machine learning for automatic analysis of affective behaviour

    Get PDF
    The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    Emotion Recognition with Asymmetry Features of EEG Signals

    Get PDF
    Currently the study of affective computing (AC) includes a focus on researching emotion regulation and recognition. Recent studies in this field have utilized deep learning architectures to enhance emotion recognition from EEG signals. An alternative approach to deep learning is to use feature engineering to extract relevant features to train supervised machine learning models. Current theories in the neuroscience field can guide this feature engineering process. Neuroscientists have suggested various models to clarify how emotions are processed. One of these models suggests that positive emotions are processed in the left hemisphere, while negative emotions are processed in the right hemisphere. This emotional processing model has inspired previous studies to propose asymmetrical features to predict emotions. However, none of these studies have statistically evaluated whether the inclusion of asymmetrical features could yield benefits such as increased accuracy or reduced training time. To address that direction, this research presents both statistical evaluations for emotion regulation and a comparable model for emotion recognition. The outcomes show that brain hemispheres and frequency bands participate differently in processing emotions and observed the presence of the two asymmetry emotion processing models but in different frequency ranges. Also, the results from this study imply that by using asymmetry EEG, emotion recognition approaches can use fewer features without significantly compromising performance.Master of Science in Applied Computer Scienc

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others
    corecore