1,190 research outputs found
Comparing brain-like representations learned by vanilla, residual, and recurrent CNN architectures
Though it has been hypothesized that state-of-the art residual networks approximate the recurrent visual system, it is yet to be seen if the representations learned by these biologically inspired CNNs actually have closer representations to neural data. It is likely that CNNs and DNNs that are most functionally similar to the brain will contain mechanisms that are most like those used by the brain. In this thesis, we investigate how different CNN architectures approximate the representations learned through the ventral-object recognition and processing-stream of the brain. We specifically evaluate how recent approximations of biological neural recurrence-such as residual connections, dense residual connections, and a biologically-inspired implemen- tation of recurrence-affect the representations learned by each CNN. We first investigate the representations learned by layers throughout a few state-of-the-art CNNs-VGG-19 (vanilla CNN), ResNet-152 (CNN with residual connections), and DenseNet-161 (CNN with dense connections). To control for differences in model depth, we then extend this analysis to the CORnet family of biologically-inspired CNN models with matching high-level architectures. The CORnet family has three models: a vanilla CNN (CORnet-Z), a CNN with biologically-valid recurrent dynamics (CORnet-R), and a CNN with both recurrent and residual connections (CORnet-S). We compare the representations of these six models to functionally aligned (with hyperalignment) fMRI brain data acquired during a naturalistic visual task. We take two approaches to comparing these CNN and brain representations. We first use forward encoding, a predictive approach that uses CNN features to predict neural responses across the whole brain. We next use representational similarity analysis (RSA) and centered kernel alignment (CKA) to measure the similarities in representation within CNN layers and specific brain ROIs. We show that, compared to vanilla CNNs, CNNs with residual and recurrent connections exhibit representations that are even more similar to those learned by the human ventral visual stream. We also achieve state-of-the-art forward encoding and RSA performance with the residual and recurrent CNN models
Neural Encoding and Decoding with Deep Learning for Natural Vision
The overarching objective of this work is to bridge neuroscience and artificial intelligence to ultimately build machines that learn, act, and think like humans. In the context of vision, the brain enables humans to readily make sense of the visual world, e.g. recognizing visual objects. Developing human-like machines requires understanding the working principles underlying the human vision. In this dissertation, I ask how the brain encodes and represents dynamic visual information from the outside world, whether brain activity can be directly decoded to reconstruct and categorize what a person is seeing, and whether neuroscience theory can be applied to artificial models to advance computer vision. To address these questions, I used deep neural networks (DNN) to establish encoding and decoding models for describing the relationships between the brain and the visual stimuli. Using the DNN, the encoding models were able to predict the functional magnetic resonance imaging (fMRI) responses throughout the visual cortex given video stimuli; the decoding models were able to reconstruct and categorize the visual stimuli based on fMRI activity. To further advance the DNN model, I have implemented a new bidirectional and recurrent neural network based on the predictive coding theory. As a theory in neuroscience, predictive coding explains the interaction among feedforward, feedback, and recurrent connections. The results showed that this brain-inspired model significantly outperforms feedforward-only DNNs in object recognition. These studies have positive impact on understanding the neural computations under human vision and improving computer vision with the knowledge from neuroscience
Kervolutional Neural Networks
Convolutional neural networks (CNNs) have enabled the state-of-the-art
performance in many computer vision tasks. However, little effort has been
devoted to establishing convolution in non-linear space. Existing works mainly
leverage on the activation layers, which can only provide point-wise
non-linearity. To solve this problem, a new operation, kervolution (kernel
convolution), is introduced to approximate complex behaviors of human
perception systems leveraging on the kernel trick. It generalizes convolution,
enhances the model capacity, and captures higher order interactions of
features, via patch-wise kernel functions, but without introducing additional
parameters. Extensive experiments show that kervolutional neural networks (KNN)
achieve higher accuracy and faster convergence than baseline CNN.Comment: oral paper in CVPR 201
Investigation the Relationship between Human Visual Brain Activity and Emotions
νμλ
Όλ¬Έ(μμ¬)--μμΈλνκ΅ λνμ :곡과λν μ»΄ν¨ν°κ³΅νλΆ,2019. 8. κΉκ±΄ν¬.μΈμ½λ© λͺ¨λΈμ μκ·ΉμΌλ‘λΆν° μ΄λ°λ λ νλμ μμΈ‘νκ³ , λκ° μ 보λ₯Ό μ΄λ» κ² μ²λ¦¬νλμ§ λΆμνκΈ° μν΄ μ¬μ©λλ€.λ°λ©΄ λμ½λ© λͺ¨λΈμ λ νλμΌλ‘λΆν° μκ·Ήμ λν μ 보λ₯Ό μμΈ‘νκ³ , νμ¬ νΉμ μκ·Ήμ΄ μ‘΄μ¬νλμ§λ₯Ό νλ¨νλ κ² μ λͺ©νλ‘ νλ€. λ λͺ¨λΈμ μ’
μ’
ν¨κ» μ¬μ©λλ€. λμ μκ° μ²΄κ³λ μκ·Ήμ λν κ°μ μ 보λ₯Ό λ΄κ³ μκ³ [15, 20], ν½μ
λ€μ΄ 무μμλ‘ μμ¬ μλ μκ·ΉμΌ λ‘λΆν° μ λλ μκ° μ²΄κ³μ νλμΌλ‘λΆν°λ κ°μ κ°μ μ 보λ₯Ό μΆμΆν΄λΌ μ μλ€λ κ²μ΄ μλ €μ Έ μλ€ [20]. μ΄λ° μ°κ΅¬λ€μ κ³ λ €νμ¬, μ°λ¦¬λ μκ° μ²΄κ³κ° μ΄λ μμ€κΉμ§ κ°μ μ 보λ₯Ό λ΄κ³ μλμ§ νꡬνλ€. μ°λ¦¬λ μΈμ½λ© λͺ¨λΈμ μ¬ μ©νμ¬ μμ/μ€μ/νμ μκ° νΉμ±(feature)κ³Ό κ°κ° κ΄λ ¨μ΄ μλ λ μμμ μ ννκ³ , μ΄ λ μμλ€λ‘λΆν° κ°μ μ 보λ₯Ό λμ½λ© νλ€. μ°λ¦¬λ νλμ½λΏλ§ μλλΌ μμμ λνΌμ§κΉμ§ μ΄μ΄μ§λ μμλ€μ΄ μ΄λ° νΉμ±λ€μ μΈμ½λ© νκ³ μ λ€λ κ²μ λ°νλ€. λ€λ₯Έ λ μμλ€κ³Ό λ¨μν CNN νΉμ±λ€κ³Όλ λ¬λ¦¬, μ΄λ¬ν λ μμλ€λ‘λΆν°λ κ°μ μ 보λ₯Ό λμ½λ© ν μ μμλ€. μ΄ κ²°κ³Όλ€μ μμ/ μ€μ/νμ μκ° νΉμ±λ€μ μΈμ½λ© νκ³ μλ λ μμλ€μ΄ μμ λ°νμ§ κ°μ μ 보 λμ½λ©κ³Ό κ΄λ ¨μ΄ μμμ 보μ¬μ£Όλ©°, λ°λΌμ νλμ½κ³Ό κ΄λ ¨λ κ°μ μ 보 λμ½λ© μ±λ₯μ μκ°κ³Ό κ΄λ ¨ μλ μ 보 μ²λ¦¬μ κΈ°μΈνλ€.Encoding models predict brain activity elicited by stimuli and are used to investigate how information is processed in the brain. Whereas decod- ing models predict information about the stimuli using brain activity and aim to identify whether such information is present. Both models are of- ten used in conjunction. The brains visual system has shown to decode stimuli related emotional information [15, 20]. However brain activity in the visual system induced by the same visual stimuli but scrambled, has also been able to decode the same emotional information [20]. Consid- ering these results, we raise the question to what extent encoded visual information also encodes emotional information. We use encoding models to select brain regions related to low-, mid- and high- level visual features and use these brain regions to decode related emotional information. We found that these features are encoded not only in the occipital lobe, but also in later regions extending to the orbito-frontal cortex. Said brain re- gions were not able to decode emotion information, whereas other brain regions and plain CNN features were. These results show that brain re- gions encoding low-, mid- and high- level visual features are not related to the previously found emotional decoding performance and thus, the decoding performance related to the occipital lobe should be contributed to non-vision related processing.Chapter 1 Introduction 1
Chapter 2 Background 4
2.1 Emotions and the Visual System 4
2.1.1 Visualsystem 4
2.1.2 Emotions 6
2.2 functional Magnetic Resonance Imaging 7
2.2.1 BOLDsignal 8
2.2.2 Analysis of fMRI 9
2.2.3 EncodingModel 10
2.2.4 DecodingModel 11
2.3 RelatedWork 13
Chapter 3 Materials & Methods 17
3.1 Experimental data 18
3.2 Encoding model 19
3.3 Decoding Model 22
Chapter 4 Results 24
4.1 Encoding 24
4.2 Decoding 28
Chapter 5 Discussion and Limitations 31
5.1 Encoding 31
5.2 Decoding 33
5.3 Limitations and Feature Directions 35
Chapter 6 Conclusion 37
μμ½ 42Maste
Sharing deep generative representation for perceived image reconstruction from human brain activity
Decoding human brain activities via functional magnetic resonance imaging
(fMRI) has gained increasing attention in recent years. While encouraging
results have been reported in brain states classification tasks, reconstructing
the details of human visual experience still remains difficult. Two main
challenges that hinder the development of effective models are the perplexing
fMRI measurement noise and the high dimensionality of limited data instances.
Existing methods generally suffer from one or both of these issues and yield
dissatisfactory results. In this paper, we tackle this problem by casting the
reconstruction of visual stimulus as the Bayesian inference of missing view in
a multiview latent variable model. Sharing a common latent representation, our
joint generative model of external stimulus and brain response is not only
"deep" in extracting nonlinear features from visual images, but also powerful
in capturing correlations among voxel activities of fMRI recordings. The
nonlinearity and deep structure endow our model with strong representation
ability, while the correlations of voxel activities are critical for
suppressing noise and improving prediction. We devise an efficient variational
Bayesian method to infer the latent variables and the model parameters. To
further improve the reconstruction accuracy, the latent representations of
testing instances are enforced to be close to that of their neighbours from the
training set via posterior regularization. Experiments on three fMRI recording
datasets demonstrate that our approach can more accurately reconstruct visual
stimuli
Constraint-free Natural Image Reconstruction from fMRI Signals Based on Convolutional Neural Network
In recent years, research on decoding brain activity based on functional
magnetic resonance imaging (fMRI) has made remarkable achievements. However,
constraint-free natural image reconstruction from brain activity is still a
challenge. The existing methods simplified the problem by using semantic prior
information or just reconstructing simple images such as letters and digitals.
Without semantic prior information, we present a novel method to reconstruct
nature images from fMRI signals of human visual cortex based on the computation
model of convolutional neural network (CNN). Firstly, we extracted the units
output of viewed natural images in each layer of a pre-trained CNN as CNN
features. Secondly, we transformed image reconstruction from fMRI signals into
the problem of CNN feature visualizations by training a sparse linear
regression to map from the fMRI patterns to CNN features. By iteratively
optimization to find the matched image, whose CNN unit features become most
similar to those predicted from the brain activity, we finally achieved the
promising results for the challenging constraint-free natural image
reconstruction. As there was no use of semantic prior information of the
stimuli when training decoding model, any category of images (not constraint by
the training set) could be reconstructed theoretically. We found that the
reconstructed images resembled the natural stimuli, especially in position and
shape. The experimental results suggest that hierarchical visual features can
effectively express the visual perception process of human brain
- β¦