36 research outputs found
Memoir Dataset: Quantifying Image Memorability in Adolescents
Every day, humans observe and interact with hundreds of images and scenes; whether it be on a cellphone, on television, or in print. Yet a vast majority of these images are forgotten, some immediately and some after variable lengths of time. Memorability is indeed a property intrinsic to all images that can be extracted, as well as predicted. While memory itself is a process that occurs in the brain of an individual, the concept of memorability is an intrinsic, continuous property of a stimulus that can be both measured and manipulated. We selected images from the MemCat data set that are annotated with adult memorability scores. By running a visual memory game online, we quantified these images on their memorability in adolescents, and compared them to the adult scores. Our results support previous research that suggest memorability is an intrinsic property of images that is consistent across viewers. Memorability rankings were consistent across adolescents and adults, indicating that viewer age is not a factor in determining the memorability of images. The ability to measure and manipulate memorability has profound applications in many fields, but specifically in adolescents, the educational purposes are promising. This work could aid in the development of educational material that is more likely to improve knowledge retention, particularly in adolescents with neuro-developmental disorders
A Computational Model to Account for Dynamics of Spatial Updating of Remembered Visual Targets across Slow and Rapid Eye Movements
Despite the ever-changing visual scene on the retina between eye movements, our perception of the visual world is constant and unified. It is generally believed that this space constancy is due to the brain’s ability of spatial updating. Although many efforts have been made to discover the mechanism underlying spatial updating across eye movements, still there are many unanswered questions about the neuronal mechanism of this phenomenon.
We developed a state space model for updating gaze-centered spatial information. To explore spatial updating, we considered two kinds of eye movements, saccade and smooth pursuit. The inputs to our proposed model are: a corollary discharge signal, an eye position signal and 2D visual topographic maps of visual stimuli. The state space is represented by a radial basis function neural network and we can obtain a topographic map of the remembered visual target in its hidden layer. Finally, the decoded location of the remembered target is the output of the model. We trained the model on the double step saccade-saccade and pursuit-saccade tasks. Training this model revealed that the receptive fields of state-space units are remapped predictively during saccades and updated continuously during smooth pursuit. Moreover, during saccades, receptive fields also expanded (to our knowledge, this predicted expansion has not yet been reported in the published literature). We believe that incorporating this model can shed light on the underlying neural mechanism for Trans-saccadic perception
Multi Scale Identity-Preserving Image-to-Image Translation Network for Low-Resolution Face Recognition
State-of-the-art deep neural network models have reached near perfect face
recognition accuracy rates on controlled high-resolution face images. However,
their performance is drastically degraded when they are tested with very
low-resolution face images. This is particularly critical in surveillance
systems, where a low-resolution probe image is to be matched with
high-resolution gallery images. super-resolution techniques aim at producing
high-resolution face images from low-resolution counterparts. While they are
capable of reconstructing images that are visually appealing, the
identity-related information is not preserved. Here, we propose an
identity-preserving end-to-end image-to-image translation deep neural network
which is capable of super-resolving very low-resolution faces to their
high-resolution counterparts while preserving identity-related information. We
achieved this by training a very deep convolutional encoder-decoder network
with a symmetric contracting path between corresponding layers. This network
was trained with a combination of a reconstruction and an identity-preserving
loss, on multi-scale low-resolution conditions. Extensive quantitative
evaluations of our proposed model demonstrated that it outperforms competing
super-resolution and low-resolution face recognition methods on natural and
artificial low-resolution face data sets and even unseen identities
Emergence of Visual Center-Periphery Spatial Organization in Deep Convolutional Neural Networks
Research at the intersection of computer vision and neuroscience has revealed hierarchical correspondence between layers of deep convolutional neural networks (DCNNs) and cascade of regions along human ventral visual cortex. Recently, studies have uncovered emergence of human interpretable concepts within DCNNs layers trained to identify visual objects and scenes. Here, we asked whether an artificial neural network (with convolutional structure) trained for visual categorization would demonstrate spatial correspondences with human brain regions showing central/peripheral biases. Using representational similarity analysis, we compared activations of convolutional layers of a DCNN trained for object and scene categorization with neural representations in human brain visual regions. Results reveal a brain-like topographical organization in the layers of the DCNN, such that activations of layer-units with central-bias were associated with brain regions with foveal tendencies (e.g. fusiform gyrus), and activations of layer-units with selectivity for image backgrounds were associated with cortical regions showing peripheral preference (e.g. parahippocampal cortex). The emergence of a categorical topographical correspondence between DCNNs and brain regions suggests these models are a good approximation of the perceptual representation generated by biological neural networks
Ultra-Rapid serial visual presentation reveals dynamics of feedforward and feedback processes in the ventral visual pathway
Human visual recognition activates a dense network of overlapping feedforward and recurrent neuronal processes, making it hard to disentangle processing in the feedforward from the feedback direction. Here, we used ultra-rapid serial visual presentation to suppress sustained activity that blurs the boundaries of processing steps, enabling us to resolve two distinct stages of processing with MEG multivariate pattern classification. The first processing stage was the rapid activation cascade of the bottom-up sweep, which terminated early as visual stimuli were presented at progressively faster rates. The second stage was the emergence of categorical information with peak latency that shifted later in time with progressively faster stimulus presentations, indexing time-consuming recurrent processing. Using MEG-fMRI fusion with representational similarity, we localized recurrent signals in early visual cortex. Together, our findings segregated an initial bottom-up sweep from subsequent feedback processing, and revealed the neural signature of increased recurrent processing demands for challenging viewing conditions
Reliability and generalizability of similarity-based fusion of meg and fmri data in human ventral and dorsal visual streams
To build a representation of what we see, the human brain recruits regions throughout the visual cortex in cascading sequence. Recently, an approach was proposed to evaluate the dynamics of visual perception in high spatiotemporal resolution at the scale of the whole brain. This method combined functional magnetic resonance imaging (fMRI) data with magnetoencephalography (MEG) data using representational similarity analysis and revealed a hierarchical progression from primary visual cortex through the dorsal and ventral streams. To assess the replicability of this method, we here present the results of a visual recognition neuro-imaging fusion experiment and compare them within and across experimental settings. We evaluated the reliability of this method by assessing the consistency of the results under similar test conditions, showing high agreement within participants. We then generalized these results to a separate group of individuals and visual input by comparing them to the fMRI-MEG fusion data of Cichy et al (2016), revealing a highly similar temporal progression recruiting both the dorsal and ventral streams. Together these results are a testament to the reproducibility of the fMRI-MEG fusion approach and allows for the interpretation of these spatiotemporal dynamic in a broader context
Population response magnitude variation in inferotemporal cortex predicts image memorability
Most accounts of image and object encoding in inferotemporal cortex (IT) focus on the distinct patterns of spikes that different images evoke across the IT population. By analyzing data collected from IT as monkeys performed a visual memory task, we demonstrate that variation in a complementary coding scheme, the magnitude of the population response, can largely account for how well images will be remembered. To investigate the origin of IT image memorability modulation, we probed convolutional neural network models trained to categorize objects. We found that, like the brain, different natural images evoked different magnitude responses from these networks, and in higher layers, larger magnitude responses were correlated with the images that humans and monkeys find most memorable. Together, these results suggest that variation in IT population response magnitude is a natural consequence of the optimizations required for visual processing, and that this variation has consequences for visual memory