79,490 research outputs found
Sharing deep generative representation for perceived image reconstruction from human brain activity
Decoding human brain activities via functional magnetic resonance imaging
(fMRI) has gained increasing attention in recent years. While encouraging
results have been reported in brain states classification tasks, reconstructing
the details of human visual experience still remains difficult. Two main
challenges that hinder the development of effective models are the perplexing
fMRI measurement noise and the high dimensionality of limited data instances.
Existing methods generally suffer from one or both of these issues and yield
dissatisfactory results. In this paper, we tackle this problem by casting the
reconstruction of visual stimulus as the Bayesian inference of missing view in
a multiview latent variable model. Sharing a common latent representation, our
joint generative model of external stimulus and brain response is not only
"deep" in extracting nonlinear features from visual images, but also powerful
in capturing correlations among voxel activities of fMRI recordings. The
nonlinearity and deep structure endow our model with strong representation
ability, while the correlations of voxel activities are critical for
suppressing noise and improving prediction. We devise an efficient variational
Bayesian method to infer the latent variables and the model parameters. To
further improve the reconstruction accuracy, the latent representations of
testing instances are enforced to be close to that of their neighbours from the
training set via posterior regularization. Experiments on three fMRI recording
datasets demonstrate that our approach can more accurately reconstruct visual
stimuli
UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity
Image reconstruction and captioning from brain activity evoked by visual
stimuli allow researchers to further understand the connection between the
human brain and the visual perception system. While deep generative models have
recently been employed in this field, reconstructing realistic captions and
images with both low-level details and high semantic fidelity is still a
challenging problem. In this work, we propose UniBrain: Unify Image
Reconstruction and Captioning All in One Diffusion Model from Human Brain
Activity. For the first time, we unify image reconstruction and captioning from
visual-evoked functional magnetic resonance imaging (fMRI) through a latent
diffusion model termed Versatile Diffusion. Specifically, we transform fMRI
voxels into text and image latent for low-level information and guide the
backward diffusion process through fMRI-based image and text conditions derived
from CLIP to generate realistic captions and images. UniBrain outperforms
current methods both qualitatively and quantitatively in terms of image
reconstruction and reports image captioning results for the first time on the
Natural Scenes Dataset (NSD) dataset. Moreover, the ablation experiments and
functional region-of-interest (ROI) analysis further exhibit the superiority of
UniBrain and provide comprehensive insight for visual-evoked brain decoding
End-to-End Deep Image Reconstruction From Human Brain Activity
Deep neural networks (DNNs) have recently been applied successfully to brain decoding and image reconstruction from functional magnetic resonance imaging (fMRI) activity. However, direct training of a DNN with fMRI data is often avoided because the size of available data is thought to be insufficient for training a complex network with numerous parameters. Instead, a pre-trained DNN usually serves as a proxy for hierarchical visual representations, and fMRI data are used to decode individual DNN features of a stimulus image using a simple linear model, which are then passed to a reconstruction module. Here, we directly trained a DNN model with fMRI data and the corresponding stimulus images to build an end-to-end reconstruction model. We accomplished this by training a generative adversarial network with an additional loss term that was defined in high-level feature space (feature loss) using up to 6,000 training data samples (natural images and fMRI responses). The above model was tested on independent datasets and directly reconstructed image using an fMRI pattern as the input. Reconstructions obtained from our proposed method resembled the test stimuli (natural and artificial images) and reconstruction accuracy increased as a function of training-data size. Ablation analyses indicated that the feature loss that we employed played a critical role in achieving accurate reconstruction. Our results show that the end-to-end model can learn a direct mapping between brain activity and perception
- …