Search CORE

20 research outputs found

Sharing deep generative representation for perceived image reconstruction from human brain activity

Author: Du Changde
Du Changying
He Huiguang
Publication venue
Publication date: 10/07/2017
Field of study

Decoding human brain activities via functional magnetic resonance imaging (fMRI) has gained increasing attention in recent years. While encouraging results have been reported in brain states classification tasks, reconstructing the details of human visual experience still remains difficult. Two main challenges that hinder the development of effective models are the perplexing fMRI measurement noise and the high dimensionality of limited data instances. Existing methods generally suffer from one or both of these issues and yield dissatisfactory results. In this paper, we tackle this problem by casting the reconstruction of visual stimulus as the Bayesian inference of missing view in a multiview latent variable model. Sharing a common latent representation, our joint generative model of external stimulus and brain response is not only "deep" in extracting nonlinear features from visual images, but also powerful in capturing correlations among voxel activities of fMRI recordings. The nonlinearity and deep structure endow our model with strong representation ability, while the correlations of voxel activities are critical for suppressing noise and improving prediction. We devise an efficient variational Bayesian method to infer the latent variables and the model parameters. To further improve the reconstruction accuracy, the latent representations of testing instances are enforced to be close to that of their neighbours from the training set via posterior regularization. Experiments on three fMRI recording datasets demonstrate that our approach can more accurately reconstruct visual stimuli

arXiv.org e-Print Archive

Crossref

Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data

Author: Du Changde
Du Changying
He Huiguang
Li Jinpeng
Lu Bao-Liang
Wang Hao
Zheng Wei-Long
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/07/2018
Field of study

There are threefold challenges in emotion recognition. First, it is difficult to recognize human's emotional states only considering a single modality. Second, it is expensive to manually annotate the emotional data. Third, emotional data often suffers from missing modalities due to unforeseeable sensor malfunction or configuration issues. In this paper, we address all these problems under a novel multi-view deep generative framework. Specifically, we propose to model the statistical relationships of multi-modality emotional data using multiple modality-specific generative networks with a shared latent space. By imposing a Gaussian mixture assumption on the posterior approximation of the shared latent variables, our framework can learn the joint deep representation from multiple modalities and evaluate the importance of each modality simultaneously. To solve the labeled-data-scarcity problem, we extend our multi-view model to semi-supervised learning scenario by casting the semi-supervised classification problem as a specialized missing data imputation task. To address the missing-modality problem, we further extend our semi-supervised multi-view model to deal with incomplete data, where a missing view is treated as a latent variable and integrated out during inference. This way, the proposed overall framework can utilize all available (both labeled and unlabeled, as well as both complete and incomplete) data to improve its generalization ability. The experiments conducted on two real multi-modal emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM Multimedia Conference (MM'18

arXiv.org e-Print Archive

Crossref

Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning

Author: Chen Xiaoyu
Du Changde
He Huiguang
Zhou Qiongyi
Publication venue
Publication date: 08/08/2023
Field of study

The human brain can easily focus on one speaker and suppress others in scenarios such as a cocktail party. Recently, researchers found that auditory attention can be decoded from the electroencephalogram (EEG) data. However, most existing deep learning methods are difficult to use prior knowledge of different views (that is attended speech and EEG are task-related views) and extract an unsatisfactory representation. Inspired by Broadbent's filter model, we decode auditory attention in a multi-view paradigm and extract the most relevant and important information utilizing the missing view. Specifically, we propose an auditory attention decoding (AAD) method based on multi-view VAE with task-related multi-view contrastive (TMC) learning. Employing TMC learning in multi-view VAE can utilize the missing view to accumulate prior knowledge of different views into the fusion of representation, and extract the approximate task-related representation. We examine our method on two popular AAD datasets, and demonstrate the superiority of our method by comparing it to the state-of-the-art method

arXiv.org e-Print Archive

Multi-view Multi-label Fine-grained Emotion Decoding from Human Brain Activity

Author: Du Changde
Fu Kaicheng
He Huiguang
Wang Shengpei
Publication venue
Publication date: 26/10/2022
Field of study

Decoding emotional states from human brain activity plays an important role in brain-computer interfaces. Existing emotion decoding methods still have two main limitations: one is only decoding a single emotion category from a brain activity pattern and the decoded emotion categories are coarse-grained, which is inconsistent with the complex emotional expression of human; the other is ignoring the discrepancy of emotion expression between the left and right hemispheres of human brain. In this paper, we propose a novel multi-view multi-label hybrid model for fine-grained emotion decoding (up to 80 emotion categories) which can learn the expressive neural representations and predicting multiple emotional states simultaneously. Specifically, the generative component of our hybrid model is parametrized by a multi-view variational auto-encoder, in which we regard the brain activity of left and right hemispheres and their difference as three distinct views, and use the product of expert mechanism in its inference network. The discriminative component of our hybrid model is implemented by a multi-label classification network with an asymmetric focal loss. For more accurate emotion decoding, we first adopt a label-aware module for emotion-specific neural representations learning and then model the dependency of emotional states by a masked self-attention mechanism. Extensive experiments on two visually evoked emotional datasets show the superiority of our method.Comment: Accepted by IEEE Transactions on Neural Networks and Learning System

arXiv.org e-Print Archive

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

Author: Du Changde
He Huiguang
Lu Yizhuo
Wang Dianpeng
zhou Qiongyi
Publication venue
Publication date: 08/08/2023
Field of study

Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1413

arXiv.org e-Print Archive

Online Bayesian Multiple Kernel Bipartite Ranking

Author: Changde Du
Changying Du
Guoping Long
Qing He
Yucheng Li
Publication venue
Publication date: 03/04/2020
Field of study

Abstract Bipartite ranking aims to maximize the area under the ROC curve (AUC) of a decision function. To tackle this problem when the data appears sequentially, existing online AUC maximization methods focus on seeking a point estimate of the decision function in a linear or predefined single kernel space, and cannot learn effective kernels automatically from the streaming data. In this paper, we first develop a Bayesian multiple kernel bipartite ranking model, which circumvents the kernel selection problem by estimating a posterior distribution over the model weights. To make our model applicable to streaming data, we then present a kernelized online Bayesian passive-aggressive learning framework by maintaining a variational approximation to the posterior based on data augmentation. Furthermore, to efficiently deal with large-scale data, we design a fixed budget strategy which can effectively control online model complexity. Extensive experimental studies confirm the superiority of our Bayesian multi-kernel approach

CiteSeerX

Conditional Generative Neural Decoding with Structured CNN Feature Prediction

Author: Du Changde
Du Changying
He Huiguang
Huang Lijie
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 03/04/2020
Field of study

Decoding visual contents from human brain activity is a challenging task with great scientific value. Two main facts that hinder existing methods from producing satisfactory results are 1) typically small paired training data; 2) under-exploitation of the structural information underlying the data. In this paper, we present a novel conditional deep generative neural decoding approach with structured intermediate feature prediction. Specifically, our approach first decodes the brain activity to the multilayer intermediate features of a pretrained convolutional neural network (CNN) with a structured multi-output regression (SMR) model, and then inverts the decoded CNN features to the visual images with an introspective conditional generation (ICG) model. The proposed SMR model can simultaneously leverage the covariance structures underlying the brain activities, the CNN features and the prediction tasks to improve the decoding accuracy and interpretability. Further, our ICG model can 1) leverage abundant unpaired images to augment the training data; 2) self-evaluate the quality of its conditionally generated images; and 3) adversarially improve itself without extra discriminator. Experimental results show that our approach yields state-of-the-art visual reconstructions from brain activities

Association for the Advancement of Artificial Intelligence: AAAI Publications