791 research outputs found

    Neural Encoding and Decoding with Deep Learning for Natural Vision

    Get PDF
    The overarching objective of this work is to bridge neuroscience and artificial intelligence to ultimately build machines that learn, act, and think like humans. In the context of vision, the brain enables humans to readily make sense of the visual world, e.g. recognizing visual objects. Developing human-like machines requires understanding the working principles underlying the human vision. In this dissertation, I ask how the brain encodes and represents dynamic visual information from the outside world, whether brain activity can be directly decoded to reconstruct and categorize what a person is seeing, and whether neuroscience theory can be applied to artificial models to advance computer vision. To address these questions, I used deep neural networks (DNN) to establish encoding and decoding models for describing the relationships between the brain and the visual stimuli. Using the DNN, the encoding models were able to predict the functional magnetic resonance imaging (fMRI) responses throughout the visual cortex given video stimuli; the decoding models were able to reconstruct and categorize the visual stimuli based on fMRI activity. To further advance the DNN model, I have implemented a new bidirectional and recurrent neural network based on the predictive coding theory. As a theory in neuroscience, predictive coding explains the interaction among feedforward, feedback, and recurrent connections. The results showed that this brain-inspired model significantly outperforms feedforward-only DNNs in object recognition. These studies have positive impact on understanding the neural computations under human vision and improving computer vision with the knowledge from neuroscience

    Effects of Temporal and Spatial Context Within the Macaque Face-Processing System

    Get PDF
    Temporal and spatial context play a key role in vision as a whole, and in face perception specifically. However, little is known about the neurophysiological mechanisms by which contextual cues exert their effects. Anatomically distinct face patches in the macaque brain analyze facial form, and studies of the activity within these patches have begun to clarify the neural machinery that underlies facial perception. This system provides a uniquely valuable opportunity to study how context affects the perception of form. We used functional magnetic resonance imaging (fMRI) to investigate the brain activity of macaque monkeys while they viewed faces placed in either temporal or spatial context. Facial motion transmits rich and ethologically vital information, but the way that the brain interprets such natural temporal context is poorly understood. Facial motion activates the face patches and surrounding areas, yet it is not known whether this motion is processed by its own specialized neural machinery, and if so, what that machinery’s organization might be. To address these questions, we monitored the brain activity of macaque monkeys while they viewed low- and high-level motion and form stimuli. We found that, beyond classical motion areas and the known face patch system, moving faces recruited a heretofore-unrecognized face patch. Although all face patches displayed distinctive selectivity for face motion over object motion, only two face patches preferred naturally moving faces, while three others preferred randomized, rapidly varying sequences of facial form. This functional divide was anatomically specific, segregating dorsal from ventral face patches, thereby revealing a new organizational principle of the macaque face-processing system. Like facial motion, bodies can provide valuable social context, revealing emotion and identity. Little is known about the joint processing of faces and bodies, even though there is reason to believe that their neural representations are intertwined. To identify interaction between the neural representations of face and body, we monitored the brain activity of the same monkeys while they viewed pictures of whole monkeys, isolated monkey heads, and isolated monkey bodies. We found that certain areas, including anterior face patches, responded more to whole monkeys than would be predicted by summing the separate responses to isolated heads and isolated bodies. The supralinear response was specific to viewing the conjunction of head and body; heads placed atop nonbody objects did not evoke this activity signature. However, a supralinear context response was elicited by pixelated, ambiguous faces presented on bodies. The size of this response suggests that the supralinear signal in this case did not result from the disambiguation of the ambiguous faces. These studies of contextually evoked activity within the macaque face processing system deepen our understanding of the cortical organization of both visual context and face processing, and identify promising sites for future research into the mechanisms underlying these critical aspects of perception

    3D View Prediction Models of the Dorsal Visual Stream

    Full text link
    Deep neural network representations align well with brain activity in the ventral visual stream. However, the primate visual system has a distinct dorsal processing stream with different functional properties. To test if a model trained to perceive 3D scene geometry aligns better with neural responses in dorsal visual areas, we trained a self-supervised geometry-aware recurrent neural network (GRNN) to predict novel camera views using a 3D feature memory. We compared GRNN to self-supervised baseline models that have been shown to align well with ventral regions using the large-scale fMRI Natural Scenes Dataset (NSD). We found that while the baseline models accounted better for ventral brain regions, GRNN accounted for a greater proportion of variance in dorsal brain regions. Our findings demonstrate the potential for using task-relevant models to probe representational differences across visual streams.Comment: 2023 Conference on Cognitive Computational Neuroscienc

    Brain-mediated Transfer Learning of Convolutional Neural Networks

    Full text link
    The human brain can effectively learn a new task from a small number of samples, which indicate that the brain can transfer its prior knowledge to solve tasks in different domains. This function is analogous to transfer learning (TL) in the field of machine learning. TL uses a well-trained feature space in a specific task domain to improve performance in new tasks with insufficient training data. TL with rich feature representations, such as features of convolutional neural networks (CNNs), shows high generalization ability across different task domains. However, such TL is still insufficient in making machine learning attain generalization ability comparable to that of the human brain. To examine if the internal representation of the brain could be used to achieve more efficient TL, we introduce a method for TL mediated by human brains. Our method transforms feature representations of audiovisual inputs in CNNs into those in activation patterns of individual brains via their association learned ahead using measured brain responses. Then, to estimate labels reflecting human cognition and behavior induced by the audiovisual inputs, the transformed representations are used for TL. We demonstrate that our brain-mediated TL (BTL) shows higher performance in the label estimation than the standard TL. In addition, we illustrate that the estimations mediated by different brains vary from brain to brain, and the variability reflects the individual variability in perception. Thus, our BTL provides a framework to improve the generalization ability of machine-learning feature representations and enable machine learning to estimate human-like cognition and behavior, including individual variability

    Constraint-free Natural Image Reconstruction from fMRI Signals Based on Convolutional Neural Network

    Full text link
    In recent years, research on decoding brain activity based on functional magnetic resonance imaging (fMRI) has made remarkable achievements. However, constraint-free natural image reconstruction from brain activity is still a challenge. The existing methods simplified the problem by using semantic prior information or just reconstructing simple images such as letters and digitals. Without semantic prior information, we present a novel method to reconstruct nature images from fMRI signals of human visual cortex based on the computation model of convolutional neural network (CNN). Firstly, we extracted the units output of viewed natural images in each layer of a pre-trained CNN as CNN features. Secondly, we transformed image reconstruction from fMRI signals into the problem of CNN feature visualizations by training a sparse linear regression to map from the fMRI patterns to CNN features. By iteratively optimization to find the matched image, whose CNN unit features become most similar to those predicted from the brain activity, we finally achieved the promising results for the challenging constraint-free natural image reconstruction. As there was no use of semantic prior information of the stimuli when training decoding model, any category of images (not constraint by the training set) could be reconstructed theoretically. We found that the reconstructed images resembled the natural stimuli, especially in position and shape. The experimental results suggest that hierarchical visual features can effectively express the visual perception process of human brain

    Naturalistic stimuli reveal a dominant role for agentic action in visual representation

    Get PDF
    Abstract Naturalistic, dynamic movies evoke strong, consistent, and information-rich patterns of activity over a broad expanse of cortex and engage multiple perceptual and cognitive systems in parallel. The use of naturalistic stimuli enables functional brain imaging research to explore cognitive domains that are poorly sampled in highly-controlled experiments. These domains include perception and understanding of agentic action, which plays a larger role in visual representation than was appreciated from experiments using static, controlled stimuli
    corecore