1,190 research outputs found

    Comparing brain-like representations learned by vanilla, residual, and recurrent CNN architectures

    Get PDF
    Though it has been hypothesized that state-of-the art residual networks approximate the recurrent visual system, it is yet to be seen if the representations learned by these biologically inspired CNNs actually have closer representations to neural data. It is likely that CNNs and DNNs that are most functionally similar to the brain will contain mechanisms that are most like those used by the brain. In this thesis, we investigate how different CNN architectures approximate the representations learned through the ventral-object recognition and processing-stream of the brain. We specifically evaluate how recent approximations of biological neural recurrence-such as residual connections, dense residual connections, and a biologically-inspired implemen- tation of recurrence-affect the representations learned by each CNN. We first investigate the representations learned by layers throughout a few state-of-the-art CNNs-VGG-19 (vanilla CNN), ResNet-152 (CNN with residual connections), and DenseNet-161 (CNN with dense connections). To control for differences in model depth, we then extend this analysis to the CORnet family of biologically-inspired CNN models with matching high-level architectures. The CORnet family has three models: a vanilla CNN (CORnet-Z), a CNN with biologically-valid recurrent dynamics (CORnet-R), and a CNN with both recurrent and residual connections (CORnet-S). We compare the representations of these six models to functionally aligned (with hyperalignment) fMRI brain data acquired during a naturalistic visual task. We take two approaches to comparing these CNN and brain representations. We first use forward encoding, a predictive approach that uses CNN features to predict neural responses across the whole brain. We next use representational similarity analysis (RSA) and centered kernel alignment (CKA) to measure the similarities in representation within CNN layers and specific brain ROIs. We show that, compared to vanilla CNNs, CNNs with residual and recurrent connections exhibit representations that are even more similar to those learned by the human ventral visual stream. We also achieve state-of-the-art forward encoding and RSA performance with the residual and recurrent CNN models

    Neural Encoding and Decoding with Deep Learning for Natural Vision

    Get PDF
    The overarching objective of this work is to bridge neuroscience and artificial intelligence to ultimately build machines that learn, act, and think like humans. In the context of vision, the brain enables humans to readily make sense of the visual world, e.g. recognizing visual objects. Developing human-like machines requires understanding the working principles underlying the human vision. In this dissertation, I ask how the brain encodes and represents dynamic visual information from the outside world, whether brain activity can be directly decoded to reconstruct and categorize what a person is seeing, and whether neuroscience theory can be applied to artificial models to advance computer vision. To address these questions, I used deep neural networks (DNN) to establish encoding and decoding models for describing the relationships between the brain and the visual stimuli. Using the DNN, the encoding models were able to predict the functional magnetic resonance imaging (fMRI) responses throughout the visual cortex given video stimuli; the decoding models were able to reconstruct and categorize the visual stimuli based on fMRI activity. To further advance the DNN model, I have implemented a new bidirectional and recurrent neural network based on the predictive coding theory. As a theory in neuroscience, predictive coding explains the interaction among feedforward, feedback, and recurrent connections. The results showed that this brain-inspired model significantly outperforms feedforward-only DNNs in object recognition. These studies have positive impact on understanding the neural computations under human vision and improving computer vision with the knowledge from neuroscience

    Kervolutional Neural Networks

    Full text link
    Convolutional neural networks (CNNs) have enabled the state-of-the-art performance in many computer vision tasks. However, little effort has been devoted to establishing convolution in non-linear space. Existing works mainly leverage on the activation layers, which can only provide point-wise non-linearity. To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. It generalizes convolution, enhances the model capacity, and captures higher order interactions of features, via patch-wise kernel functions, but without introducing additional parameters. Extensive experiments show that kervolutional neural networks (KNN) achieve higher accuracy and faster convergence than baseline CNN.Comment: oral paper in CVPR 201

    Investigation the Relationship between Human Visual Brain Activity and Emotions

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2019. 8. 김건희.인코딩 λͺ¨λΈμ€ μžκ·ΉμœΌλ‘œλΆ€ν„° μ΄‰λ°œλœ λ‡Œ ν™œλ™μ„ μ˜ˆμΈ‘ν•˜κ³ , λ‡Œκ°€ 정보λ₯Ό μ–΄λ–» 게 μ²˜λ¦¬ν•˜λŠ”μ§€ λΆ„μ„ν•˜κΈ° μœ„ν•΄ μ‚¬μš©λœλ‹€.반면 λ””μ½”λ”© λͺ¨λΈμ€ λ‡Œ ν™œλ™μœΌλ‘œλΆ€ν„° μžκ·Ήμ— λŒ€ν•œ 정보λ₯Ό μ˜ˆμΈ‘ν•˜κ³ , ν˜„μž¬ νŠΉμ • 자극이 μ‘΄μž¬ν•˜λŠ”μ§€λ₯Ό νŒλ‹¨ν•˜λŠ” 것 을 λͺ©ν‘œλ‘œ ν•œλ‹€. 두 λͺ¨λΈμ€ μ’…μ’… ν•¨κ»˜ μ‚¬μš©λœλ‹€. λ‡Œμ˜ μ‹œκ° μ²΄κ³„λŠ” μžκ·Ήμ— λŒ€ν•œ 감정 정보λ₯Ό λ‹΄κ³  있고 [15, 20], 픽셀듀이 λ¬΄μž‘μœ„λ‘œ μ„žμ—¬ μžˆλŠ” 자극으 λ‘œλΆ€ν„° μœ λ„λœ μ‹œκ° μ²΄κ³„μ˜ ν™œλ™μœΌλ‘œλΆ€ν„°λ„ 같은 감정 정보λ₯Ό μΆ”μΆœν•΄λ‚Ό 수 μžˆλ‹€λŠ” 것이 μ•Œλ €μ Έ μžˆλ‹€ [20]. 이런 연ꡬ듀을 κ³ λ €ν•˜μ—¬, μš°λ¦¬λŠ” μ‹œκ° 체계가 μ–΄λŠ μˆ˜μ€€κΉŒμ§€ 감정 정보λ₯Ό λ‹΄κ³  μžˆλŠ”μ§€ νƒκ΅¬ν•œλ‹€. μš°λ¦¬λŠ” 인코딩 λͺ¨λΈμ„ 사 μš©ν•˜μ—¬ μƒμœ„/μ€‘μœ„/ν•˜μœ„ μ‹œκ° νŠΉμ„±(feature)κ³Ό 각각 관련이 μžˆλŠ” λ‡Œ μ˜μ—­μ„ μ„ νƒν•˜κ³ , 이 λ‡Œ μ˜μ—­λ“€λ‘œλΆ€ν„° 감정 정보λ₯Ό λ””μ½”λ”© ν•œλ‹€. μš°λ¦¬λŠ” ν›„λ‘μ—½λΏλ§Œ μ•„λ‹ˆλΌ μ•ˆμ™€μ „λ‘ν”Όμ§ˆκΉŒμ§€ μ΄μ–΄μ§€λŠ” μ˜μ—­λ“€μ΄ 이런 νŠΉμ„±λ“€μ„ 인코딩 ν•˜κ³  있 λ‹€λŠ” 것을 λ°νžŒλ‹€. λ‹€λ₯Έ λ‡Œ μ˜μ—­λ“€κ³Ό λ‹¨μˆœν•œ CNN νŠΉμ„±λ“€κ³ΌλŠ” 달리, μ΄λŸ¬ν•œ λ‡Œ μ˜μ—­λ“€λ‘œλΆ€ν„°λŠ” 감정 정보λ₯Ό λ””μ½”λ”© ν•  수 μ—†μ—ˆλ‹€. 이 결과듀은 μƒμœ„/ μ€‘μœ„/ν•˜μœ„ μ‹œκ° νŠΉμ„±λ“€μ„ 인코딩 ν•˜κ³  μžˆλŠ” λ‡Œ μ˜μ—­λ“€μ΄ μ•žμ„œ λ°ν˜€μ§„ 감정 정보 λ””μ½”λ”©κ³Ό 관련이 μ—†μŒμ„ 보여주며, λ”°λΌμ„œ 후두엽과 κ΄€λ ¨λœ 감정 정보 λ””μ½”λ”© μ„±λŠ₯은 μ‹œκ°κ³Ό κ΄€λ ¨ μ—†λŠ” 정보 μ²˜λ¦¬μ— κΈ°μΈν•œλ‹€.Encoding models predict brain activity elicited by stimuli and are used to investigate how information is processed in the brain. Whereas decod- ing models predict information about the stimuli using brain activity and aim to identify whether such information is present. Both models are of- ten used in conjunction. The brains visual system has shown to decode stimuli related emotional information [15, 20]. However brain activity in the visual system induced by the same visual stimuli but scrambled, has also been able to decode the same emotional information [20]. Consid- ering these results, we raise the question to what extent encoded visual information also encodes emotional information. We use encoding models to select brain regions related to low-, mid- and high- level visual features and use these brain regions to decode related emotional information. We found that these features are encoded not only in the occipital lobe, but also in later regions extending to the orbito-frontal cortex. Said brain re- gions were not able to decode emotion information, whereas other brain regions and plain CNN features were. These results show that brain re- gions encoding low-, mid- and high- level visual features are not related to the previously found emotional decoding performance and thus, the decoding performance related to the occipital lobe should be contributed to non-vision related processing.Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 Emotions and the Visual System 4 2.1.1 Visualsystem 4 2.1.2 Emotions 6 2.2 functional Magnetic Resonance Imaging 7 2.2.1 BOLDsignal 8 2.2.2 Analysis of fMRI 9 2.2.3 EncodingModel 10 2.2.4 DecodingModel 11 2.3 RelatedWork 13 Chapter 3 Materials & Methods 17 3.1 Experimental data 18 3.2 Encoding model 19 3.3 Decoding Model 22 Chapter 4 Results 24 4.1 Encoding 24 4.2 Decoding 28 Chapter 5 Discussion and Limitations 31 5.1 Encoding 31 5.2 Decoding 33 5.3 Limitations and Feature Directions 35 Chapter 6 Conclusion 37 μš”μ•½ 42Maste

    Sharing deep generative representation for perceived image reconstruction from human brain activity

    Full text link
    Decoding human brain activities via functional magnetic resonance imaging (fMRI) has gained increasing attention in recent years. While encouraging results have been reported in brain states classification tasks, reconstructing the details of human visual experience still remains difficult. Two main challenges that hinder the development of effective models are the perplexing fMRI measurement noise and the high dimensionality of limited data instances. Existing methods generally suffer from one or both of these issues and yield dissatisfactory results. In this paper, we tackle this problem by casting the reconstruction of visual stimulus as the Bayesian inference of missing view in a multiview latent variable model. Sharing a common latent representation, our joint generative model of external stimulus and brain response is not only "deep" in extracting nonlinear features from visual images, but also powerful in capturing correlations among voxel activities of fMRI recordings. The nonlinearity and deep structure endow our model with strong representation ability, while the correlations of voxel activities are critical for suppressing noise and improving prediction. We devise an efficient variational Bayesian method to infer the latent variables and the model parameters. To further improve the reconstruction accuracy, the latent representations of testing instances are enforced to be close to that of their neighbours from the training set via posterior regularization. Experiments on three fMRI recording datasets demonstrate that our approach can more accurately reconstruct visual stimuli

    Constraint-free Natural Image Reconstruction from fMRI Signals Based on Convolutional Neural Network

    Full text link
    In recent years, research on decoding brain activity based on functional magnetic resonance imaging (fMRI) has made remarkable achievements. However, constraint-free natural image reconstruction from brain activity is still a challenge. The existing methods simplified the problem by using semantic prior information or just reconstructing simple images such as letters and digitals. Without semantic prior information, we present a novel method to reconstruct nature images from fMRI signals of human visual cortex based on the computation model of convolutional neural network (CNN). Firstly, we extracted the units output of viewed natural images in each layer of a pre-trained CNN as CNN features. Secondly, we transformed image reconstruction from fMRI signals into the problem of CNN feature visualizations by training a sparse linear regression to map from the fMRI patterns to CNN features. By iteratively optimization to find the matched image, whose CNN unit features become most similar to those predicted from the brain activity, we finally achieved the promising results for the challenging constraint-free natural image reconstruction. As there was no use of semantic prior information of the stimuli when training decoding model, any category of images (not constraint by the training set) could be reconstructed theoretically. We found that the reconstructed images resembled the natural stimuli, especially in position and shape. The experimental results suggest that hierarchical visual features can effectively express the visual perception process of human brain
    • …
    corecore