20 research outputs found
Sharing deep generative representation for perceived image reconstruction from human brain activity
Decoding human brain activities via functional magnetic resonance imaging
(fMRI) has gained increasing attention in recent years. While encouraging
results have been reported in brain states classification tasks, reconstructing
the details of human visual experience still remains difficult. Two main
challenges that hinder the development of effective models are the perplexing
fMRI measurement noise and the high dimensionality of limited data instances.
Existing methods generally suffer from one or both of these issues and yield
dissatisfactory results. In this paper, we tackle this problem by casting the
reconstruction of visual stimulus as the Bayesian inference of missing view in
a multiview latent variable model. Sharing a common latent representation, our
joint generative model of external stimulus and brain response is not only
"deep" in extracting nonlinear features from visual images, but also powerful
in capturing correlations among voxel activities of fMRI recordings. The
nonlinearity and deep structure endow our model with strong representation
ability, while the correlations of voxel activities are critical for
suppressing noise and improving prediction. We devise an efficient variational
Bayesian method to infer the latent variables and the model parameters. To
further improve the reconstruction accuracy, the latent representations of
testing instances are enforced to be close to that of their neighbours from the
training set via posterior regularization. Experiments on three fMRI recording
datasets demonstrate that our approach can more accurately reconstruct visual
stimuli
Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data
There are threefold challenges in emotion recognition. First, it is difficult
to recognize human's emotional states only considering a single modality.
Second, it is expensive to manually annotate the emotional data. Third,
emotional data often suffers from missing modalities due to unforeseeable
sensor malfunction or configuration issues. In this paper, we address all these
problems under a novel multi-view deep generative framework. Specifically, we
propose to model the statistical relationships of multi-modality emotional data
using multiple modality-specific generative networks with a shared latent
space. By imposing a Gaussian mixture assumption on the posterior approximation
of the shared latent variables, our framework can learn the joint deep
representation from multiple modalities and evaluate the importance of each
modality simultaneously. To solve the labeled-data-scarcity problem, we extend
our multi-view model to semi-supervised learning scenario by casting the
semi-supervised classification problem as a specialized missing data imputation
task. To address the missing-modality problem, we further extend our
semi-supervised multi-view model to deal with incomplete data, where a missing
view is treated as a latent variable and integrated out during inference. This
way, the proposed overall framework can utilize all available (both labeled and
unlabeled, as well as both complete and incomplete) data to improve its
generalization ability. The experiments conducted on two real multi-modal
emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM
Multimedia Conference (MM'18
Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
The human brain can easily focus on one speaker and suppress others in
scenarios such as a cocktail party. Recently, researchers found that auditory
attention can be decoded from the electroencephalogram (EEG) data. However,
most existing deep learning methods are difficult to use prior knowledge of
different views (that is attended speech and EEG are task-related views) and
extract an unsatisfactory representation. Inspired by Broadbent's filter model,
we decode auditory attention in a multi-view paradigm and extract the most
relevant and important information utilizing the missing view. Specifically, we
propose an auditory attention decoding (AAD) method based on multi-view VAE
with task-related multi-view contrastive (TMC) learning. Employing TMC learning
in multi-view VAE can utilize the missing view to accumulate prior knowledge of
different views into the fusion of representation, and extract the approximate
task-related representation. We examine our method on two popular AAD datasets,
and demonstrate the superiority of our method by comparing it to the
state-of-the-art method
Multi-view Multi-label Fine-grained Emotion Decoding from Human Brain Activity
Decoding emotional states from human brain activity plays an important role
in brain-computer interfaces. Existing emotion decoding methods still have two
main limitations: one is only decoding a single emotion category from a brain
activity pattern and the decoded emotion categories are coarse-grained, which
is inconsistent with the complex emotional expression of human; the other is
ignoring the discrepancy of emotion expression between the left and right
hemispheres of human brain. In this paper, we propose a novel multi-view
multi-label hybrid model for fine-grained emotion decoding (up to 80 emotion
categories) which can learn the expressive neural representations and
predicting multiple emotional states simultaneously. Specifically, the
generative component of our hybrid model is parametrized by a multi-view
variational auto-encoder, in which we regard the brain activity of left and
right hemispheres and their difference as three distinct views, and use the
product of expert mechanism in its inference network. The discriminative
component of our hybrid model is implemented by a multi-label classification
network with an asymmetric focal loss. For more accurate emotion decoding, we
first adopt a label-aware module for emotion-specific neural representations
learning and then model the dependency of emotional states by a masked
self-attention mechanism. Extensive experiments on two visually evoked
emotional datasets show the superiority of our method.Comment: Accepted by IEEE Transactions on Neural Networks and Learning System
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion
Reconstructing visual stimuli from brain recordings has been a meaningful and
challenging task. Especially, the achievement of precise and controllable image
reconstruction bears great significance in propelling the progress and
utilization of brain-computer interfaces. Despite the advancements in complex
image reconstruction techniques, the challenge persists in achieving a cohesive
alignment of both semantic (concepts and objects) and structure (position,
orientation, and size) with the image stimuli. To address the aforementioned
issue, we propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings
decoded from fMRI are put into Stable Diffusion, which yields a preliminary
image that contains semantic information. In Stage 2, we utilize the CLIP
visual feature decoded from fMRI as supervisory information, and continually
adjust the two feature vectors decoded in Stage 1 through backpropagation to
align the structural information. The results of both qualitative and
quantitative analyses demonstrate that our model has surpassed the current
state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent
experimental findings corroborate the neurobiological plausibility of the
model, as evidenced by the interpretability of the multimodal feature employed,
which align with the corresponding brain responses.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1413
Online Bayesian Multiple Kernel Bipartite Ranking
Abstract Bipartite ranking aims to maximize the area under the ROC curve (AUC) of a decision function. To tackle this problem when the data appears sequentially, existing online AUC maximization methods focus on seeking a point estimate of the decision function in a linear or predefined single kernel space, and cannot learn effective kernels automatically from the streaming data. In this paper, we first develop a Bayesian multiple kernel bipartite ranking model, which circumvents the kernel selection problem by estimating a posterior distribution over the model weights. To make our model applicable to streaming data, we then present a kernelized online Bayesian passive-aggressive learning framework by maintaining a variational approximation to the posterior based on data augmentation. Furthermore, to efficiently deal with large-scale data, we design a fixed budget strategy which can effectively control online model complexity. Extensive experimental studies confirm the superiority of our Bayesian multi-kernel approach
Conditional Generative Neural Decoding with Structured CNN Feature Prediction
Decoding visual contents from human brain activity is a challenging task with great scientific value. Two main facts that hinder existing methods from producing satisfactory results are 1) typically small paired training data; 2) under-exploitation of the structural information underlying the data. In this paper, we present a novel conditional deep generative neural decoding approach with structured intermediate feature prediction. Specifically, our approach first decodes the brain activity to the multilayer intermediate features of a pretrained convolutional neural network (CNN) with a structured multi-output regression (SMR) model, and then inverts the decoded CNN features to the visual images with an introspective conditional generation (ICG) model. The proposed SMR model can simultaneously leverage the covariance structures underlying the brain activities, the CNN features and the prediction tasks to improve the decoding accuracy and interpretability. Further, our ICG model can 1) leverage abundant unpaired images to augment the training data; 2) self-evaluate the quality of its conditionally generated images; and 3) adversarially improve itself without extra discriminator. Experimental results show that our approach yields state-of-the-art visual reconstructions from brain activities