594 research outputs found
Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition
The primate visual system achieves remarkable visual object recognition
performance even in brief presentations and under changes to object exemplar,
geometric transformations, and background variation (a.k.a. core visual object
recognition). This remarkable performance is mediated by the representation
formed in inferior temporal (IT) cortex. In parallel, recent advances in
machine learning have led to ever higher performing models of object
recognition using artificial deep neural networks (DNNs). It remains unclear,
however, whether the representational performance of DNNs rivals that of the
brain. To accurately produce such a comparison, a major difficulty has been a
unifying metric that accounts for experimental limitations such as the amount
of noise, the number of neural recording sites, and the number trials, and
computational limitations such as the complexity of the decoding classifier and
the number of classifier training examples. In this work we perform a direct
comparison that corrects for these experimental limitations and computational
considerations. As part of our methodology, we propose an extension of "kernel
analysis" that measures the generalization accuracy as a function of
representational complexity. Our evaluations show that, unlike previous
bio-inspired models, the latest DNNs rival the representational performance of
IT cortex on this visual object recognition task. Furthermore, we show that
models that perform well on measures of representational performance also
perform well on measures of representational similarity to IT and on measures
of predicting individual IT multi-unit responses. Whether these DNNs rely on
computational mechanisms similar to the primate visual system is yet to be
determined, but, unlike all previous bio-inspired models, that possibility
cannot be ruled out merely on representational performance grounds.Comment: 35 pages, 12 figures, extends and expands upon arXiv:1301.353
Comparing primate’s ventral visual stream and the state-of-the-art deep convolutional neural networks for core object recognition
Our ability to recognize and categorize objects in our surroundings is a critical component of our cognitive processes. Despite the enormous variations in each object's appearance (Due to variations in object position, pose, scale, illumination, and the presence of visual clutter), primates are thought to be able to quickly and easily distinguish objects from among tens of thousands of possibilities. The primate's ventral visual stream is believed to support this view-invariant visual object recognition ability by untangling object identity manifolds. Convolutional Neural Networks (CNNs), inspired by the primate's visual system, have also shown remarkable performance in object recognition tasks. This review aims to explore and compare the mechanisms of object recognition in the primate's ventral visual stream and state-of-the-art deep CNNs. The research questions address the extent to which CNNs have approached human-level object recognition and how their performance compares to the primate ventral visual stream. The objectives include providing an overview of the literature on the ventral visual stream and CNNs, comparing their mechanisms, and identifying strengths and limitations for core object recognition. The review is structured to present the ventral visual stream's structure, visual representations, and the process of untangling object manifolds. It also covers the architecture of CNNs. The review also compared the two visual systems and the results showed that deep CNNs have shown remarkable performance and capability in certain aspects of object recognition, but there are still limitations in replicating the complexities of the primate visual system. Further research is needed to bridge the gap between computational models and the intricate neural mechanisms underlying human object recognition.Our ability to recognize and categorize objects in our surroundings is a critical component of our cognitive processes. Despite the enormous variations in each object's appearance (Due to variations in object position, pose, scale, illumination, and the presence of visual clutter), primates are thought to be able to quickly and easily distinguish objects from among tens of thousands of possibilities. The primate's ventral visual stream is believed to support this view-invariant visual object recognition ability by untangling object identity manifolds. Convolutional Neural Networks (CNNs), inspired by the primate's visual system, have also shown remarkable performance in object recognition tasks. This review aims to explore and compare the mechanisms of object recognition in the primate's ventral visual stream and state-of-the-art deep CNNs. The research questions address the extent to which CNNs have approached human-level object recognition and how their performance compares to the primate ventral visual stream. The objectives include providing an overview of the literature on the ventral visual stream and CNNs, comparing their mechanisms, and identifying strengths and limitations for core object recognition. The review is structured to present the ventral visual stream's structure, visual representations, and the process of untangling object manifolds. It also covers the architecture of CNNs. The review also compared the two visual systems and the results showed that deep CNNs have shown remarkable performance and capability in certain aspects of object recognition, but there are still limitations in replicating the complexities of the primate visual system. Further research is needed to bridge the gap between computational models and the intricate neural mechanisms underlying human object recognition
Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex
One of the most impactful findings in computational neuroscience over the
past decade is that the object recognition accuracy of deep neural networks
(DNNs) correlates with their ability to predict neural responses to natural
images in the inferotemporal (IT) cortex. This discovery supported the
long-held theory that object recognition is a core objective of the visual
cortex, and suggested that more accurate DNNs would serve as better models of
IT neuron responses to images. Since then, deep learning has undergone a
revolution of scale: billion parameter-scale DNNs trained on billions of images
are rivaling or outperforming humans at visual tasks including object
recognition. Have today's DNNs become more accurate at predicting IT neuron
responses to images as they have grown more accurate at object recognition?
Surprisingly, across three independent experiments, we find this is not the
case. DNNs have become progressively worse models of IT as their accuracy has
increased on ImageNet. To understand why DNNs experience this trade-off and
evaluate if they are still an appropriate paradigm for modeling the visual
system, we turn to recordings of IT that capture spatially resolved maps of
neuronal activity elicited by natural images. These neuronal activity maps
reveal that DNNs trained on ImageNet learn to rely on different visual features
than those encoded by IT and that this problem worsens as their accuracy
increases. We successfully resolved this issue with the neural harmonizer, a
plug-and-play training routine for DNNs that aligns their learned
representations with humans. Our results suggest that harmonized DNNs break the
trade-off between ImageNet accuracy and neural prediction accuracy that assails
current DNNs and offer a path to more accurate models of biological vision
A Neural Algorithm of Artistic Style
In fine art, especially painting, humans have mastered the skill to create
unique visual experiences through composing a complex interplay between the
content and style of an image. Thus far the algorithmic basis of this process
is unknown and there exists no artificial system with similar capabilities.
However, in other key areas of visual perception such as object and face
recognition near-human performance was recently demonstrated by a class of
biologically inspired vision models called Deep Neural Networks. Here we
introduce an artificial system based on a Deep Neural Network that creates
artistic images of high perceptual quality. The system uses neural
representations to separate and recombine content and style of arbitrary
images, providing a neural algorithm for the creation of artistic images.
Moreover, in light of the striking similarities between performance-optimised
artificial neural networks and biological vision, our work offers a path
forward to an algorithmic understanding of how humans create and perceive
artistic imagery
- …