1,316 research outputs found
Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Despite the recent progress in speech emotion recognition (SER),
state-of-the-art systems are unable to achieve improved performance in
cross-language settings. In this paper, we propose a Multimodal Dual Attention
Transformer (MDAT) model to improve cross-language SER. Our model utilises
pre-trained models for multimodal feature extraction and is equipped with a
dual attention mechanism including graph attention and co-attention to capture
complex dependencies across different modalities and achieve improved
cross-language SER results using minimal target language data. In addition, our
model also exploits a transformer encoder layer for high-level feature
representation to improve emotion classification accuracy. In this way, MDAT
performs refinement of feature representation at various stages and provides
emotional salient features to the classification layer. This novel approach
also ensures the preservation of modality-specific emotional information while
enhancing cross-modality and cross-language interactions. We assess our model's
performance on four publicly available SER datasets and establish its superior
effectiveness compared to recent approaches and baseline models.Comment: Under Review IEEE TM
EEG-Based Emotion Recognition Using Regularized Graph Neural Networks
Electroencephalography (EEG) measures the neuronal activities in different
brain regions via electrodes. Many existing studies on EEG-based emotion
recognition do not fully exploit the topology of EEG channels. In this paper,
we propose a regularized graph neural network (RGNN) for EEG-based emotion
recognition. RGNN considers the biological topology among different brain
regions to capture both local and global relations among different EEG
channels. Specifically, we model the inter-channel relations in EEG signals via
an adjacency matrix in a graph neural network where the connection and
sparseness of the adjacency matrix are inspired by neuroscience theories of
human brain organization. In addition, we propose two regularizers, namely
node-wise domain adversarial training (NodeDAT) and emotion-aware distribution
learning (EmotionDL), to better handle cross-subject EEG variations and noisy
labels, respectively. Extensive experiments on two public datasets, SEED and
SEED-IV, demonstrate the superior performance of our model than
state-of-the-art models in most experimental settings. Moreover, ablation
studies show that the proposed adjacency matrix and two regularizers contribute
consistent and significant gain to the performance of our RGNN model. Finally,
investigations on the neuronal activities reveal important brain regions and
inter-channel relations for EEG-based emotion recognition
Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition
Despite the recent progress in speech emotion recognition (SER),
state-of-the-art systems lack generalisation across different conditions. A key
underlying reason for poor generalisation is the scarcity of emotion datasets,
which is a significant roadblock to designing robust machine learning (ML)
models. Recent works in SER focus on utilising multitask learning (MTL) methods
to improve generalisation by learning shared representations. However, most of
these studies propose MTL solutions with the requirement of meta labels for
auxiliary tasks, which limits the training of SER systems. This paper proposes
an MTL framework (MTL-AUG) that learns generalised representations from
augmented data. We utilise augmentation-type classification and unsupervised
reconstruction as auxiliary tasks, which allow training SER systems on
augmented data without requiring any meta labels for auxiliary tasks. The
semi-supervised nature of MTL-AUG allows for the exploitation of the abundant
unlabelled data to further boost the performance of SER. We comprehensively
evaluate the proposed framework in the following settings: (1) within corpus,
(2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial
attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB
datasets show improved results compared to existing state-of-the-art methods.Comment: Under review IEEE Transactions on Affective Computin
Pathway to Future Symbiotic Creativity
This report presents a comprehensive view of our vision on the development
path of the human-machine symbiotic art creation. We propose a classification
of the creative system with a hierarchy of 5 classes, showing the pathway of
creativity evolving from a mimic-human artist (Turing Artists) to a Machine
artist in its own right. We begin with an overview of the limitations of the
Turing Artists then focus on the top two-level systems, Machine Artists,
emphasizing machine-human communication in art creation. In art creation, it is
necessary for machines to understand humans' mental states, including desires,
appreciation, and emotions, humans also need to understand machines' creative
capabilities and limitations. The rapid development of immersive environment
and further evolution into the new concept of metaverse enable symbiotic art
creation through unprecedented flexibility of bi-directional communication
between artists and art manifestation environments. By examining the latest
sensor and XR technologies, we illustrate the novel way for art data collection
to constitute the base of a new form of human-machine bidirectional
communication and understanding in art creation. Based on such communication
and understanding mechanisms, we propose a novel framework for building future
Machine artists, which comes with the philosophy that a human-compatible AI
system should be based on the "human-in-the-loop" principle rather than the
traditional "end-to-end" dogma. By proposing a new form of inverse
reinforcement learning model, we outline the platform design of machine
artists, demonstrate its functions and showcase some examples of technologies
we have developed. We also provide a systematic exposition of the ecosystem for
AI-based symbiotic art form and community with an economic model built on NFT
technology. Ethical issues for the development of machine artists are also
discussed
- …